Caching merged diffs

View: New views
13 Messages — Rating Filter:   Alert me  

Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

In Differ, merged diffs are repeatedly calculated whenever we iterate
over changes. The merged versions of diffs can theoretically differ,
as the underlying text is passed in on each call. However, this text
always ends up being the raw file text, and we always update our diffs
whenever this text changes. As such, we can cache the merged version
of our diffs by disallowing the effectively unused texts argument.

I'm attaching two patches, the first of which implements this caching
of merged diffs, and the second of which provides a short-circuit for
two-way comparisons. On two-way diffs, these patches provide a very
slight speed improvement; on three-way diffs, the speed-up is easily
measurable -- around 5% in a few quick tests based on scrolling
through a three-way source file comparison.

Comments appreciated.

cheers,
Kai



_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

0001-Cache-merged-diffs-by-disallowing-arbitrary-text-seq.patch (10K) Download Attachment
0002-Don-t-merge-diffs-for-two-way-comparisons.patch (1K) Download Attachment

Re: Caching merged diffs

by Stephen Kennedy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> In Differ, merged diffs are repeatedly calculated whenever we iterate
> over changes. The merged versions of diffs can theoretically differ,
> as the underlying text is passed in on each call. However, this text
> always ends up being the raw file text, and we always update our diffs
> whenever this text changes. As such, we can cache the merged version
> of our diffs by disallowing the effectively unused texts argument.
>
> I'm attaching two patches, the first of which implements this caching
> of merged diffs, and the second of which provides a short-circuit for
> two-way comparisons. On two-way diffs, these patches provide a very
> slight speed improvement; on three-way diffs, the speed-up is easily
> measurable -- around 5% in a few quick tests based on scrolling
> through a three-way source file comparison.

Thats a good idea. I've semi-deliberately not changed any of the diffing
code in ages because I've always intended to rewrite it with an asynchronous
interface. I think this is the only way to get acceptable performance on
large files/large change blocks.

It's reasonable then to put the diff computation & most of the i/o in a
subprocess, which would make the ui much more responsive.

Stephen.
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Maybe I overlooked something, but it seems that passing 2 arguments to
seq_sequences_iter() is not needed.
Patch attached.

Cheers,
Piotr

Kai Willadsen wrote:

> Hi all,
>
> In Differ, merged diffs are repeatedly calculated whenever we iterate
> over changes. The merged versions of diffs can theoretically differ,
> as the underlying text is passed in on each call. However, this text
> always ends up being the raw file text, and we always update our diffs
> whenever this text changes. As such, we can cache the merged version
> of our diffs by disallowing the effectively unused texts argument.
>
> I'm attaching two patches, the first of which implements this caching
> of merged diffs, and the second of which provides a short-circuit for
> two-way comparisons. On two-way diffs, these patches provide a very
> slight speed improvement; on three-way diffs, the speed-up is easily
> measurable -- around 5% in a few quick tests based on scrolling
> through a three-way source file comparison.
>
> Comments appreciated.
>
> cheers,
> Kai
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> meld-list mailing list
> meld-list@...
> http://mail.gnome.org/mailman/listinfo/meld-list

From 7946d01f45df9af20fb15d559fefee9b88d7d2de Mon Sep 17 00:00:00 2001
From: Piotr Piastucki <the_leech@...>
Date: Tue, 14 Jul 2009 18:46:47 +0200
Subject: [PATCH] Remove texts from set_sequences_iter arguments

---
 diffutil.py |    4 ++--
 filediff.py |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/diffutil.py b/diffutil.py
index 3da1c1a..08b8470 100644
--- a/diffutil.py
+++ b/diffutil.py
@@ -248,7 +248,7 @@ class Differ(object):
                 for c in self._auto_merge(using, texts):
                     yield c
 
-    def set_sequences_iter(self, sequences, texts):
+    def set_sequences_iter(self, sequences):
         assert 0 <= len(sequences) <= 3
         self.diffs = [[], []]
         self.num_sequences = len(sequences)
@@ -260,6 +260,6 @@ class Differ(object):
             while work.next() == None:
                 yield None
             self.diffs[i] = matcher.get_difference_opcodes()
-        self._update_merge_cache(texts)
+        self._update_merge_cache(sequences)
         yield 1
 
diff --git a/filediff.py b/filediff.py
index 5e9441b..acc1546 100644
--- a/filediff.py
+++ b/filediff.py
@@ -591,7 +591,7 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
         yield _("[%s] Computing differences") % self.label_text
         panetext = [self._filter_text(p) for p in panetext]
         lines = map(lambda x: x.split("\n"), panetext)
-        step = self.linediffer.set_sequences_iter(lines, self._get_texts())
+        step = self.linediffer.set_sequences_iter(lines)
         while step.next() == None:
             yield 1
 
--
1.6.0.4


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/15 Piotr Piastucki <leech.miranda@...>:
> Hi,
>
> Maybe I overlooked something, but it seems that passing 2 arguments to
> seq_sequences_iter() is not needed.
> Patch attached.

I could swear that I had a reason for passing in text, but I can't
imagine what it was. Applied, thanks.

Kai
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/11 Stephen Kennedy <stevek@...>:

>> In Differ, merged diffs are repeatedly calculated whenever we iterate
>> over changes. The merged versions of diffs can theoretically differ,
>> as the underlying text is passed in on each call. However, this text
>> always ends up being the raw file text, and we always update our diffs
>> whenever this text changes. As such, we can cache the merged version
>> of our diffs by disallowing the effectively unused texts argument.
>>
>> I'm attaching two patches, the first of which implements this caching
>> of merged diffs, and the second of which provides a short-circuit for
>> two-way comparisons. On two-way diffs, these patches provide a very
>> slight speed improvement; on three-way diffs, the speed-up is easily
>> measurable -- around 5% in a few quick tests based on scrolling
>> through a three-way source file comparison.
>
> Thats a good idea. I've semi-deliberately not changed any of the diffing
> code in ages because I've always intended to rewrite it with an asynchronous
> interface. I think this is the only way to get acceptable performance on
> large files/large change blocks.
>
> It's reasonable then to put the diff computation & most of the i/o in a
> subprocess, which would make the ui much more responsive.

I couldn't agree more. This caching patch has the benefit that it
drops us to set_sequences_iter() and change_sequence() as the only
interface points that modify internal state. Turning these into
asynchronous methods shouldn't be *too* hard...

Kai
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/11 Stephen Kennedy <stevek@...>:
>> It's reasonable then to put the diff computation & most of the i/o in a
>> subprocess, which would make the ui much more responsive.
>>    
Regarding ui responsiveness, I am sending a patch that tries to improve
performance of inline highlighting.
Test case:
1) run "meld filediff.py meldapp.py"
2) type something in either of the panes
3) wait.... :)
I know it is not a real-life example, but it shows exactly what the
issue is.
Patch:
I added a simple cache that stores replace chunks. The cache is updated
and checked in _update_highlighting(). Chunks that did not change are
not taken into account and as a result inline highlighting becomes
somewhat incremental.
Moreover I could not find any place where "delete line", "conflict line"
or "replace line" tags are set so I changed this piece of code and
remove_tag() is called for "inline line" only.

Comments welcome.

Cheers,
Piotr


From 7e85c89f9831f5cc5280d75dc0ffae4ae66db234 Mon Sep 17 00:00:00 2001
From: Piotr Piastucki <the_leech@...>
Date: Fri, 17 Jul 2009 17:29:29 +0200
Subject: [PATCH] Inline highlighting performance improvements

---
 filediff.py |   92 +++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/filediff.py b/filediff.py
index acc1546..145d061 100644
--- a/filediff.py
+++ b/filediff.py
@@ -91,6 +91,7 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
         self._sync_vscroll_lock = False
         self._sync_hscroll_lock = False
         self.linediffer = diffutil.Differ()
+        self._inline_cache = []
         for text in self.textview:
             text.set_wrap_mode( self.prefs.edit_wrap_lines )
         for buf in self.textbuffer:
@@ -622,47 +623,70 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
             mgr.clear()
 
     def _update_highlighting(self):
-        for b in self.textbuffer:
-            taglist = ["delete line", "conflict line", "replace line", "inline line"]
-            table = b.get_tag_table()
-            for tagname in taglist:
-                tag = table.lookup(tagname)
+        alltexts = [b for b in self._get_texts(raw=1)]
+        alltags = [b.get_tag_table().lookup("inline line") for b in self.textbuffer]
+        clearchunks = len(self._inline_cache) != 0
+        if not clearchunks:
+            for b, tag in zip(self.textbuffer, alltags):
                 b.remove_tag(tag, b.get_start_iter(), b.get_end_iter() )
+        newcache = []
+        startlines = [0] * 3
         for chunk in self.linediffer.all_changes():
             for i,c in enumerate(chunk):
                 if c and c[0] == "replace":
                     bufs = self.textbuffer[1], self.textbuffer[i*2]
-                    #tags = [b.get_tag_table().lookup("replace line") for b in bufs]
-                    starts = [b.get_iter_at_line(l) for b,l in zip(bufs, (c[1],c[3])) ]
-                    text1 = "\n".join( self._get_texts(raw=1)[1  ][c[1]:c[2]] ).encode("utf16")
-                    text1 = struct.unpack("%iH"%(len(text1)/2), text1)[1:]
-                    textn = "\n".join( self._get_texts(raw=1)[i*2][c[3]:c[4]] ).encode("utf16")
-                    textn = struct.unpack("%iH"%(len(textn)/2), textn)[1:]
-
-                    tags = [b.get_tag_table().lookup("inline line") for b in bufs]
-                    # For very long sequences, bail rather than trying a very slow comparison
-                    inline_limit = 8000 # arbitrary constant
-                    if len(text1) > inline_limit and len(textn) > inline_limit:
-                        ends = [b.get_iter_at_line(l) for b, l in zip(bufs, (c[2], c[4]))]
-                        for i in range(2):
-                            bufs[i].apply_tag(tags[i], starts[i], ends[i])
-                        continue
-
-                    matcher = difflib.SequenceMatcher(None, text1, textn)
-                    #print "<<<\n%s\n---\n%s\n>>>" % (text1, textn)
-                    back = (0,0)
-                    for o in matcher.get_opcodes():
-                        if o[0] == "equal":
-                            if (o[2]-o[1] < 3) or (o[4]-o[3] < 3):
-                                back = o[4]-o[3], o[2]-o[1]
+                    text1 = alltexts[1][c[1]:c[2]]
+                    textn = alltexts[i*2][c[3]:c[4]]
+                    cacheitem = (i, c, text1, textn)
+                    newcache.append(cacheitem)
+                    notincache = not cacheitem in self._inline_cache
+                    if clearchunks:
+                        # clean all previous non-replace chunks + this chunk if it was not found in the cache
+                        bufs[1].remove_tag(alltags[i*2], get_iter_at_line_or_eof(bufs[1], startlines[i*2]), get_iter_at_line_or_eof(bufs[1], c[3 + int(notincache)]))
+                        bufs[0].remove_tag(alltags[1], get_iter_at_line_or_eof(bufs[0], startlines[1]), get_iter_at_line_or_eof(bufs[0], c[1 + int(notincache)]))
+                    startlines[i*2] = c[4]
+                    startlines[1] = c[2]
+
+                    if notincache:
+                        tags = alltags[1], alltags[i*2]
+                        #tags = [b.get_tag_table().lookup("replace line") for b in bufs]
+                        starts = [get_iter_at_line_or_eof(b, l) for b,l in zip(bufs, (c[1],c[3])) ]
+                        text1 = "\n".join(alltexts[1][c[1]:c[2]] ).encode("utf16")
+                        text1 = struct.unpack("%iH"%(len(text1)/2), text1)[1:]
+                        textn = "\n".join(alltexts[i*2][c[3]:c[4]] ).encode("utf16")
+                        textn = struct.unpack("%iH"%(len(textn)/2), textn)[1:]
+
+                        # For very long sequences, bail rather than trying a very slow comparison
+                        inline_limit = 8000 # arbitrary constant
+                        if len(text1) > inline_limit and len(textn) > inline_limit:
+                            ends = [get_iter_at_line_or_eof(b, l) for b, l in zip(bufs, (c[2], c[4]))]
+                            for i in range(2):
+                                bufs[i].apply_tag(tags[i], starts[i], ends[i])
                             continue
-                        for i in range(2):
-                            s,e = starts[i].copy(), starts[i].copy()
-                            s.forward_chars( o[1+2*i] - back[i] )
-                            e.forward_chars( o[2+2*i] )
-                            bufs[i].apply_tag(tags[i], s, e)
+
+                        matcher = difflib.SequenceMatcher(None, text1, textn)
+                        #print "<<<\n%s\n---\n%s\n>>>" % (text1, textn)
                         back = (0,0)
-                    yield 1
+                        for o in matcher.get_opcodes():
+                            if o[0] == "equal":
+                                if (o[2]-o[1] < 3) or (o[4]-o[3] < 3):
+                                    back = o[4]-o[3], o[2]-o[1]
+                                continue
+                            for i in range(2):
+                                s,e = starts[i].copy(), starts[i].copy()
+                                s.forward_chars( o[1+2*i] - back[i] )
+                                e.forward_chars( o[2+2*i] )
+                                bufs[i].apply_tag(tags[i], s, e)
+                            back = (0,0)
+                        yield 1
+
+        # clean up all tailing lines
+        if clearchunks:
+            for b, tag, start in zip(self.textbuffer, alltags, startlines):
+                b.remove_tag(tag, get_iter_at_line_or_eof(b, start), b.get_end_iter() )
+        self._inline_cache = newcache
+        yield 1
+
 
     def on_textview_expose_event(self, textview, event):
         if self.num_panes == 1:
--
1.6.0.4


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/18 Piotr Piastucki <leech.miranda@...>:

> 2009/7/11 Stephen Kennedy <stevek@...>:
>>>
>>> It's reasonable then to put the diff computation & most of the i/o in a
>>> subprocess, which would make the ui much more responsive.
>>>
>
> Regarding ui responsiveness, I am sending a patch that tries to improve
> performance of inline highlighting.
> Test case:
> 1) run "meld filediff.py meldapp.py"
> 2) type something in either of the panes
> 3) wait.... :)
> I know it is not a real-life example, but it shows exactly what the issue
> is.
> Patch:
> I added a simple cache that stores replace chunks. The cache is updated and
> checked in _update_highlighting(). Chunks that did not change are not taken
> into account and as a result inline highlighting becomes somewhat
> incremental.
Amusingly enough, I looked at exactly the same issue yesterday. I went
for a simpler approach though (patch attached), and just cached the
SequenceMatcher calls; I'm pretty sure that that's the only really
expensive part of the updating. My version is missing cache-clearing,
but that's fairly easy to add.

> Moreover I could not find any place where "delete line", "conflict line" or
> "replace line" tags are set so I changed this piece of code and remove_tag()
> is called for "inline line" only.

Right - that's a separate issue, and should be fixed regardless. I
believe that of the tags we currently set up, only edited and inline
are used. One could imagine that tagging the whole buffer
appropriately might be useful in some way... but I'm not really sure
how.

Kai

[0001-Cache-inline-diff-results-to-avoid-rediffing-unchang.patch]

From 2ffcbe2c770582a315e50a6541859b5846b18fce Mon Sep 17 00:00:00 2001
From: Kai Willadsen <kai.willadsen@...>
Date: Fri, 17 Jul 2009 12:58:39 +1000
Subject: [PATCH] Cache inline-diff results to avoid rediffing unchanged blocks

---
 filediff.py |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/filediff.py b/filediff.py
index acc1546..e4ccdff 100644
--- a/filediff.py
+++ b/filediff.py
@@ -46,6 +46,19 @@ gdk = gtk.gdk
 
 MASK_SHIFT, MASK_CTRL = 1, 2
 
+class CachedSequenceMatcher(object):
+    def __init__(self):
+        self.cache = {}
+
+    def __call__(self, text1, textn):
+        try:
+            return self.cache[(text1, textn)]
+        except KeyError:
+            matcher = difflib.SequenceMatcher(None, text1, textn)
+            opcodes = matcher.get_opcodes()
+            self.cache[(text1, textn)] = opcodes
+            return opcodes
+
 def get_iter_at_line_or_eof(buffer, line):
     if line >= buffer.get_line_count():
         return buffer.get_end_iter()
@@ -91,6 +104,7 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
         self._sync_vscroll_lock = False
         self._sync_hscroll_lock = False
         self.linediffer = diffutil.Differ()
+        self.cached_match = CachedSequenceMatcher()
         for text in self.textview:
             text.set_wrap_mode( self.prefs.edit_wrap_lines )
         for buf in self.textbuffer:
@@ -648,10 +662,9 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
                             bufs[i].apply_tag(tags[i], starts[i], ends[i])
                         continue
 
-                    matcher = difflib.SequenceMatcher(None, text1, textn)
                     #print "<<<\n%s\n---\n%s\n>>>" % (text1, textn)
                     back = (0,0)
-                    for o in matcher.get_opcodes():
+                    for o in self.cached_match(text1, textn):
                         if o[0] == "equal":
                             if (o[2]-o[1] < 3) or (o[4]-o[3] < 3):
                                 back = o[4]-o[3], o[2]-o[1]
--
1.6.0.6



_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kai Willadsen wrote:
> Amusingly enough, I looked at exactly the same issue yesterday. I went
> for a simpler approach though (patch attached), and just cached the
> SequenceMatcher calls; I'm pretty sure that that's the only really
> expensive part of the updating. My version is missing cache-clearing,
> but that's fairly easy to add.
>  
Hi,

I did a quick test (run meld filediff.py meldapp.py, add a single
character) and measured the time taken by _update_highlighting():
meld 1.3 - 1160ms
my patch - 12 ms
Kai's patch - 52ms

No matter which patch is applied the performance gain is *huge*,
however, Kai's code looks much better :)

Cheers,
Piotr


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/18 Piotr Piastucki <leech.miranda@...>:
> I did a quick test (run meld filediff.py meldapp.py, add a single character)
> and measured the time taken by _update_highlighting():
> meld 1.3 - 1160ms
> my patch - 12 ms
> Kai's patch - 52ms

I'd be interested to know how you did these tests. I've had real
problems getting reliable results when testing performance
improvements.

> No matter which patch is applied the performance gain is *huge*, however,
> Kai's code looks much better :)

Cheers. I've attached a second-pass patch with very simple cache
eviction, and the results seem to be pretty promising. I'm sure it
could be tweaked, but there's no doubt that it's a huge improvement.

Kai


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

0001-Cache-inline-diff-results-to-avoid-rediffing-blocks.patch (4K) Download Attachment

Re: Caching merged diffs

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Kai Willadsen wrote:
> I'd be interested to know how you did these tests. I've had real
> problems getting reliable results when testing performance
> improvements.
>
>  

I just measured the time taken by _update_highlighting(). This time it
was quite easy as the results are 100% reproducible, which is not always
the case. E.g. sometimes I need to add some "warm-up" calls as the first
executions of methods tend to be much slower.

One thing I have just noticed about my patch - it is a bit more flexible
when the highlighting code changes (remember
http://bugzilla.gnome.org/show_bug.cgi?id=587410 ?) as caching does not
rely on the sequence matching algorithm.

Regards,
Piotr


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/20 Piotr Piastucki <leech.miranda@...>:
> One thing I have just noticed about my patch - it is a bit more flexible
> when the highlighting code changes (remember
> http://bugzilla.gnome.org/show_bug.cgi?id=587410 ?) as caching does not rely
> on the sequence matching algorithm.

It's true that my method would need to be updated slightly for changes
to the highlighting code, but it wouldn't be difficult to do so. In
fact, a second cache for storing differences between word-level tokens
might be a huge benefit in some fairly common cases. (I'm thinking of
comparisons that consist of lots of common prefix changes, like log
files in which only dates or file paths differ.)

One downside to your method is that chunk position is part of the
cache. The most obvious issue with this is that any change that
inserts or deletes lines causes lots of rediffing. For example,
instead of typing a single character in your "meld filediff.py
meldapp.py" example, just hit enter.

I'm going to look at whether we can combine parts of your approach to
avoid rehighlighting at all with my simpler caching mechanism.

cheers,
Kai
_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Piotr Piastucki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Mon, Jul 20, 2009 at 2:44 AM, Kai Willadsen <kai.willadsen@...> wrote:

One downside to your method is that chunk position is part of the
cache. The most obvious issue with this is that any change that
inserts or deletes lines causes lots of rediffing. For example,
instead of typing a single character in your "meld filediff.py
meldapp.py" example, just hit enter.

Yes, that's why I am fully in favour of your approach :)
Just wanted to mention one rather minor (as the highlighting code should not change often) disadvantage.

Regards,
Piotr


_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

Re: Caching merged diffs

by Kai Willadsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/7/20 Piotr Piastucki <leech.miranda@...>:

> On Mon, Jul 20, 2009 at 2:44 AM, Kai Willadsen <kai.willadsen@...>
> wrote:
>>
>> One downside to your method is that chunk position is part of the
>> cache. The most obvious issue with this is that any change that
>> inserts or deletes lines causes lots of rediffing. For example,
>> instead of typing a single character in your "meld filediff.py
>> meldapp.py" example, just hit enter.
>
> Yes, that's why I am fully in favour of your approach :)
> Just wanted to mention one rather minor (as the highlighting code should not
> change often) disadvantage.
Well, I decided that your approach was well worth it too. :)

I tested with a few different cases, and there are certainly benefits
to doing as little work as possible. For example, in the case of
tokenising the input text, just the splitting process can take a while
on large change blocks... and my approach doesn't skip that.

I've reworked your approach, and integrated my caching with it. I had
to change the cache eviction, and while it has some flaws, I think
it's pretty good without making the SequenceMatcher calls async or
getting overcomplicated.

Patches attached.

Kai



_______________________________________________
meld-list mailing list
meld-list@...
http://mail.gnome.org/mailman/listinfo/meld-list

0001-Avoid-retagging-whole-textbuffers-if-diff-chunks-hav.patch (6K) Download Attachment
0002-Cache-inline-diff-results-to-avoid-rediffing-blocks.patch (5K) Download Attachment