Cleaning up Differ

View: New views
8 Messages — Rating Filter:   Alert me  

Cleaning up Differ

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Attached for review are the first few patches in a series aimed at
cleaning up diffutil.Differ. These include some cleanup of unused
arguments and return values, as well as some small refactoring.

Comments appreciated.

Kai




_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

0001-Remove-unused-change-range-arguments-and-cleanup-app.patch (5K) Download Attachment
0002-Refactor-_locate_chunk.patch (1K) Download Attachment
0003-Simplify-creation-of-diff-chunks.patch (1K) Download Attachment

Re: Cleaning up Differ

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I am attaching yet another patch that simplifies Differ class.

Cheers,
Piotr


Kai Willadsen wrote:

> Hi,
>
> Attached for review are the first few patches in a series aimed at
> cleaning up diffutil.Differ. These include some cleanup of unused
> arguments and return values, as well as some small refactoring.
>
> Comments appreciated.
>
> Kai
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> meld-list mailing list
> meld-list@...
> http://mail.gnome.org/mailman/listinfo/meld-list

From 7ad31ac0f501023700d970339f29cc356aee9308 Mon Sep 17 00:00:00 2001
From: Piotr Piastucki <the_leech@...>
Date: Sun, 7 Jun 2009 09:11:11 +0200
Subject: [PATCH] Simplify Differ

---
 diffutil.py |   26 ++++++++------------------
 1 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/diffutil.py b/diffutil.py
index ab88818..66f3b9a 100644
--- a/diffutil.py
+++ b/diffutil.py
@@ -172,25 +172,17 @@ class Differ(object):
                     c = cs[1]
                     yield c[0], c[1], c[2], c[3], c[4], 2
 
-    def _merge_blocks(self, using, low_seq, high_seq, last_diff):
+    def _merge_blocks(self, using):
         LO, HI = 1,2
-        lowc  = using[low_seq][ 0][LO]
-        highc = using[low_seq][-1][HI]
-        if len(using[not low_seq]):
-            lowc  =  min(lowc,  using[not low_seq][ 0][LO])
-            highc =  max(highc, using[not low_seq][-1][HI])
+        lowc  =  min(using[0][ 0][LO], using[1][ 0][LO])
+        highc =  max(using[0][-1][HI], using[1][-1][HI])
         low = []
         high = []
         for i in (0,1):
-            if len(using[i]):
-                d = using[i][0]
-                low.append(  lowc  - d[LO] + d[2+LO] )
-                d = using[i][-1]
-                high.append( highc - d[HI] + d[2+HI] )
-            else:
-                d = last_diff
-                low.append(  lowc  - d[LO] + d[2+LO] )
-                high.append( highc - d[HI] + d[2+HI] )
+            d = using[i][0]
+            low.append(  lowc  - d[LO] + d[2+LO] )
+            d = using[i][-1]
+            high.append( highc - d[HI] + d[2+HI] )
         return low[0], high[0], lowc, highc, low[1], high[1]
 
     def _merge_diffs(self, seq0, seq1, texts):
@@ -231,8 +223,6 @@ class Differ(object):
                     high_diff = other_diff
                     high_mark = other_diff[HI]
 
-            block = self._merge_blocks( using, base_seq, high_seq, block)
-
             if len(using[0])==0:
                 assert len(using[1])==1
                 yield None, using[1][0]
@@ -240,7 +230,7 @@ class Differ(object):
                 assert len(using[0])==1
                 yield using[0][0], None
             else:
-                l0, h0, l1, h1, l2, h2 = block
+                l0, h0, l1, h1, l2, h2 = block = self._merge_blocks( using )
                 if h0-l0 == h2-l2 and texts[0][l0:h0] == texts[2][l2:h2]:
                     if l1 != h1:
                         out0 = ('replace', block[2], block[3], block[0], block[1])
--
1.6.0.4


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/6/7 Piotr Piastucki <leech.miranda@...>:
> Hi,
>
> I am attaching yet another patch that simplifies Differ class.

I had a look at this patch (this is from bug 577092, right?)
yesterday. In quick testing, it seems like the new code has equivalent
effect, but I don't think that this is obvious from the patch.

For example, I couldn't convince myself that moving the call to
self._merge_blocks() didn't change behaviour. Some explanation of the
logic behind the change would really help here.

Kai
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I am a bit surprised that the patch is not self-explanatory, because the
code in diffutil is quite simple after all, but here is the info you
requested:

1) block is not need if
len(using[0])==0 OR len(using[1])==0
because of:
...
            block = self._merge_blocks( using, base_seq, high_seq, block)

            if len(using[0])==0:
                assert len(using[1])==1
                yield None, using[1][0]
            elif len(using[1])==0:
                assert len(using[0])==1
                yield using[0][0], None
            else:
...
so moving the call here does not change anything:
...
                yield using[0][0], None
            else:
                block = self._merge_blocks( using, base_seq, high_seq,
block)
...


2) Hence self._merge_blocks is now called if and only if both
len(using[0]) and len(using[1]) are != 0, then the following checks in
merge_blocks() become redundant:
a)        if len(using[not low_seq]):
b)        if len(using[i]):

As a result merge_blocks() may be simplified to:

    def _merge_blocks(self, using, low_seq, high_seq, last_diff):
         LO, HI = 1,2
         lowc  =  min(using[low_seq][ 0][LO],  using[not low_seq][ 0][LO])
         highc =  max(using[low_seq][-1][HI], using[not low_seq][-1][HI])
         lowc  =  min(using[0][ 0][LO], using[1][ 0][LO])
         highc =  max(using[0][-1][HI], using[1][-1][HI])
         low = []
         high = []
         for i in (0,1):
            if len(using[i]):
                d = using[i][0]
                low.append(  lowc  - d[LO] + d[2+LO] )
                d = using[i][-1]
                high.append( highc - d[HI] + d[2+HI] )
         return low[0], high[0], lowc, highc, low[1], high[1]


3) Finally, optimizing min/max into a single line is pretty
straight-forward.

Should you have any questions please let me know.

Thanks,
Piotr



Kai Willadsen wrote:

> 2009/6/7 Piotr Piastucki <leech.miranda@...>:
>  
>> Hi,
>>
>> I am attaching yet another patch that simplifies Differ class.
>>    
>
> I had a look at this patch (this is from bug 577092, right?)
> yesterday. In quick testing, it seems like the new code has equivalent
> effect, but I don't think that this is obvious from the patch.
>
> For example, I couldn't convince myself that moving the call to
> self._merge_blocks() didn't change behaviour. Some explanation of the
> logic behind the change would really help here.
>
> Kai
>  

_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A small correction - the code after removing the checks in 2) will be of
course as follows:

   def _merge_blocks(self, using, low_seq, high_seq, last_diff):
        LO, HI = 1,2
        lowc  = using[low_seq][ 0][LO]
        highc = using[low_seq][-1][HI]
        lowc  =  min(lowc,  using[not low_seq][ 0][LO])
        highc =  max(highc, using[not low_seq][-1][HI])
        low = []
        high = []
        for i in (0,1):
           if len(using[i]):
               d = using[i][0]
               low.append(  lowc  - d[LO] + d[2+LO] )
               d = using[i][-1]
               high.append( highc - d[HI] + d[2+HI] )
        return low[0], high[0], lowc, highc, low[1], high[1]

Cheers,
Piotr

Piotr Piastucki wrote:

> Hi,
>
> I am a bit surprised that the patch is not self-explanatory, because
> the code in diffutil is quite simple after all, but here is the info
> you requested:
>
> 1) block is not need if
> len(using[0])==0 OR len(using[1])==0
> because of:
> ...
>            block = self._merge_blocks( using, base_seq, high_seq, block)
>
>            if len(using[0])==0:
>                assert len(using[1])==1
>                yield None, using[1][0]
>            elif len(using[1])==0:
>                assert len(using[0])==1
>                yield using[0][0], None
>            else:
> ...
> so moving the call here does not change anything:
> ...
>                yield using[0][0], None
>            else:
>                block = self._merge_blocks( using, base_seq, high_seq,
> block)
> ...
>
>
> 2) Hence self._merge_blocks is now called if and only if both
> len(using[0]) and len(using[1]) are != 0, then the following checks in
> merge_blocks() become redundant:
> a)        if len(using[not low_seq]):
> b)        if len(using[i]):
>
> As a result merge_blocks() may be simplified to:
>
>    def _merge_blocks(self, using, low_seq, high_seq, last_diff):
>         LO, HI = 1,2
>         lowc  =  min(using[low_seq][ 0][LO],  using[not low_seq][ 0][LO])
>         highc =  max(using[low_seq][-1][HI], using[not low_seq][-1][HI])
>         lowc  =  min(using[0][ 0][LO], using[1][ 0][LO])
>         highc =  max(using[0][-1][HI], using[1][-1][HI])
>         low = []
>         high = []
>         for i in (0,1):
>            if len(using[i]):
>                d = using[i][0]
>                low.append(  lowc  - d[LO] + d[2+LO] )
>                d = using[i][-1]
>                high.append( highc - d[HI] + d[2+HI] )
>         return low[0], high[0], lowc, highc, low[1], high[1]
>
>
> 3) Finally, optimizing min/max into a single line is pretty
> straight-forward.
>
> Should you have any questions please let me know.
>
> Thanks,
> Piotr
>
>
>
> Kai Willadsen wrote:
>> 2009/6/7 Piotr Piastucki <leech.miranda@...>:
>>  
>>> Hi,
>>>
>>> I am attaching yet another patch that simplifies Differ class.
>>>    
>>
>> I had a look at this patch (this is from bug 577092, right?)
>> yesterday. In quick testing, it seems like the new code has equivalent
>> effect, but I don't think that this is obvious from the patch.
>>
>> For example, I couldn't convince myself that moving the call to
>> self._merge_blocks() didn't change behaviour. Some explanation of the
>> logic behind the change would really help here.
>>
>> Kai
>>  
>

_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/6/7 Piotr Piastucki <leech.miranda@...>:
> I am a bit surprised that the patch is not self-explanatory, because the
> code in diffutil is quite simple after all, but here is the info you
> requested:

Parts 2 and 3 are indeed obvious, given that part 1 is correct.

> 1) block is not need if
> len(using[0])==0 OR len(using[1])==0
> because of:
> ...
>           block = self._merge_blocks( using, base_seq, high_seq, block)
>
>           if len(using[0])==0:
>               assert len(using[1])==1
>               yield None, using[1][0]
>           elif len(using[1])==0:
>               assert len(using[0])==1
>               yield using[0][0], None
>           else:
> ...
> so moving the call here does not change anything:
> ...
>               yield using[0][0], None
>           else:
>               block = self._merge_blocks( using, base_seq, high_seq, block)
> ...

This isn't (by itself) true. The call you're moving assigns to block,
which is used in subsequent calls to _merge_blocks(). In order to not
change behaviour, either the value of block needs to be unchanged by
the cases that you're now skipping, or the changes to block need to be
irrelevant. The commit message should include some brief explanation
about why this is the case.

Kai
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> This isn't (by itself) true. The call you're moving assigns to block,
> which is used in subsequent calls to _merge_blocks(). In order to not
> change behaviour, either the value of block needs to be unchanged by
> the cases that you're now skipping, or the changes to block need to be
> irrelevant. The commit message should include some brief explanation
> about why this is the case.
>
> Kai
>  

Ok, I agree this part may not look obvious, so I will provide further
explanation. Basically, the previous value of block is irrelevant and
should not be passed, because:

1) last_diff (previous value of block) is used in merge_blocks() only if
len(using[0]) or len(using[1]) is 0:
...
            if len(using[i]):
                d = using[i][0]
                low.append(  lowc  - d[LO] + d[2+LO] )
                d = using[i][-1]
                high.append( highc - d[HI] + d[2+HI] )
            else:
                d = last_diff

...

The value calculated and returned by merge_blocks() in this case is
obviously not used to compute the result of merge_diffs()

2) No matter what previous value of block is passed to merge_blocks() it
will not be used when both len(using[0]) and len(using[1]) are not 0.
This is obviously the only case when the result returned by
merge_blocks() will affect the result of merge_diffs().

3) Having said that it seems to be reasonable to assume that there is no
need to pass any previous value of block to merge_blocks(), because it
will never affect the result of merge_diffs() which is actually the only
caller of merge_blocks().

Please let me know your thoughts.

Thanks,
Piotr
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Cleaning up Differ

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/6/7 Piotr Piastucki <leech.miranda@...>:

>
>> This isn't (by itself) true. The call you're moving assigns to block,
>> which is used in subsequent calls to _merge_blocks(). In order to not
>> change behaviour, either the value of block needs to be unchanged by
>> the cases that you're now skipping, or the changes to block need to be
>> irrelevant. The commit message should include some brief explanation
>> about why this is the case.
>>
>> Kai
>>
>
> Ok, I agree this part may not look obvious, so I will provide further
> explanation. Basically, the previous value of block is irrelevant and should
> not be passed, because:
>
> 1) last_diff (previous value of block) is used in merge_blocks() only if
> len(using[0]) or len(using[1]) is 0:
> ...
>           if len(using[i]):
>               d = using[i][0]
>               low.append(  lowc  - d[LO] + d[2+LO] )
>               d = using[i][-1]
>               high.append( highc - d[HI] + d[2+HI] )
>           else:
>               d = last_diff
>
> ...
>
> The value calculated and returned by merge_blocks() in this case is
> obviously not used to compute the result of merge_diffs()
>
> 2) No matter what previous value of block is passed to merge_blocks() it
> will not be used when both len(using[0]) and len(using[1]) are not 0. This
> is obviously the only case when the result returned by merge_blocks() will
> affect the result of merge_diffs().
>
> 3) Having said that it seems to be reasonable to assume that there is no
> need to pass any previous value of block to merge_blocks(), because it will
> never affect the result of merge_diffs() which is actually the only caller
> of merge_blocks().
>
> Please let me know your thoughts.
Thanks - that's exactly the kind of reasoning I was looking for. I've
attached an updated version of the patch including a brief version of
this reasoning in the commit message. There's also a small additional
cleanup attached.

Further review and testing would be much appreciated.

cheers,
Kai



_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

0001-Simplifying-diff-chunk-merging.patch (3K) Download Attachment
0002-Clean-up-unused-assignments.patch (1K) Download Attachment