Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

View: New views
8 Messages — Rating Filter:   Alert me  

Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Chris Newcombe-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please could the isolation behavior be officially documented for
ets:insert, ets:insert_new, and ets:delete_all_objects?

I'd like to insert multiple items into an ETS table with both
atomicity and isolation.
i.e. I don't want concurrent readers to observe in-progress writes in
which some but not all of the objects have been inserted.

The ETS man pages don't explicitly say that either ets:insert/2 or
ets:insert_new/2 are isolated in this regard.

delete_all_objects(Tab) is documented as "The deletion is atomic."
But the unqualified word "atomic" doesn't necessarily imply isolation.
 e.g. It could reasonably mean that the delete operation has
run-to-completion-without-errors semantics, without saying anything
about whether concurrent readers are able to observe intermediate
states during the deletion.

Slide #40 of Ulf's recent presentation on multicore (
http://ulf.wiger.net/weblog/wp-content/uploads/2009/01/damp09-erlang-multicore.pdf
)
says that "ets:insert(Objects) is atomic". But that property doesn't
seem to be documented in the ETS man-pages (see below) -- and it again
"atomic" doesn't necessarily imply isolation anyway..

Ulf's code on that slide suggests he was actually talking about
insert_new.  The documentation for insert_new does say that all of the
collision-tests are done before anything is inserted, so it is atomic
with respect to the behavior of that particular insert_new operation
(all items will be inserted or none will, and conflicting writes are
prevented).  But it doesn't actually say that while the items are
being inserted, concurrent readers cannot observe intermediate states
(subsets of the new rows).

If necessary I will work around the issue with ets:replace trick.  But
as that has it's own messy issues (see
http://erlang.org/pipermail/erlang-questions/2007-April/025866.html ),
I'd really like to avoid it if possible.

many thanks,

Chris

From R13B01 man page:

insert(Tab, ObjectOrObjects) -> true

Types:

Tab = tid() | atom()
ObjectOrObjects = tuple() | [tuple()]

Inserts the object or all of the objects in the list ObjectOrObjects
into the table Tab. If the table is a set and the key of the inserted
objects matches the key of any object in the table, the old object
will be replaced. If the table is an ordered_set and the key of the
inserted object compares equal to the key of any object in the table,
the old object is also replaced. If the list contains more than one
object with matching keys and the table is a set, one will be
inserted, which one is not defined. The same thing holds for
ordered_set, but will also happen if the keys compare equal.

insert_new(Tab, ObjectOrObjects) -> bool()

Types:

Tab = tid() | atom()
ObjectOrObjects = tuple() | [tuple()]

This function works exactly like insert/2, with the exception that
instead of overwriting objects with the same key (in the case of set
or ordered_set) or adding more objects with keys already existing in
the table (in the case of bag and duplicate_bag), it simply returns
false. If ObjectOrObjects is a list, the function checks every key
prior to inserting anything. Nothing will be inserted if not all keys
present in the list are absent from the table.

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Ulf Wiger-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris Newcombe wrote:
> Please could the isolation behavior be officially documented for
> ets:insert, ets:insert_new, and ets:delete_all_objects?

Granted, you're asking for official documentation, and I
agree that this is important.

Meanwhile, the implementation (R13B01) of ets:insert/2
takes a table lock if the second argument is a non-empty
list in order to ensure atomicity. Same with insert_new().

  /* Write lock table if more than one object to keep atomicy */
  kind = ((is_list(BIF_ARG_2) && CDR(list_val(BIF_ARG_2)) != NIL)
          ? LCK_WRITE : LCK_WRITE_REC);

Since write locks are exclusive (no reads allowed while a
resource is write-locked), isolation will also hold.

BR,
Ulf W
--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Sverker Eriksson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ulf Wiger wrote:
> Chris Newcombe wrote:
>> Please could the isolation behavior be officially documented for
>> ets:insert, ets:insert_new, and ets:delete_all_objects?
>
> Granted, you're asking for official documentation, and I
> agree that this is important.
>
As Ulf mentioned; insert, insert_new and delete_all_objects are all both
atomic and isolated.

I can't see that we ever would want (and dare) to break that.
I will make that guarantee more explicit in the documentation for next
release.

/Sverker, Erlang/OTP




________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Chris Newcombe-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I will make that guarantee more explicit in the documentation for next
> release.

Excellent -- thankyou very much!

Chris


On Wed, Jul 1, 2009 at 7:02 AM, Sverker
Eriksson<sverker@...> wrote:

> Ulf Wiger wrote:
>>
>> Chris Newcombe wrote:
>>>
>>> Please could the isolation behavior be officially documented for
>>> ets:insert, ets:insert_new, and ets:delete_all_objects?
>>
>> Granted, you're asking for official documentation, and I
>> agree that this is important.
>>
> As Ulf mentioned; insert, insert_new and delete_all_objects are all both
> atomic and isolated.
>
> I can't see that we ever would want (and dare) to break that.
> I will make that guarantee more explicit in the documentation for next
> release.
>
> /Sverker, Erlang/OTP
>
>
>
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Chris Newcombe-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> As Ulf mentioned; insert, insert_new and delete_all_objects are all both
>> atomic and isolated.
>>
>> I can't see that we ever would want (and dare) to break that.


I just noticed that the new {write_concurrency, true} option says that
write-locks might no-longer be taken at the table level.

    "Different parts of the same table can be mutated (and read) by
concurrent processes."

(full text below)

It does not say which write/read APIs are allowed to be concurrent.

So there's the usual natural tension between clean API semantics for
compound write operations, and increased concurrency.  e.g. Some
applications might want atomicity, but might care more about increased
concurrency than full isolation.  Other applications (like my current
one) might really need strong isolation.

But I guess that backwards-compatibility reasons will dominate your
decision (quite understandably).  Given the historic implicit behavior
(strong isolation) for delete_all_objects, insert, and insert_new, it
would be dangerous to change them now.   Also, strong isolation
follows the principle of least surprise.

It would be great if the updated documentation for the APIs
specifically described the isolation semantics when
{write_concurrency,true} is used.

And vice-versa too; e.g. it would good if the documentation for
write_concurrency mentioned that compound-write operations will either
acquire multiple fine-grain write-locks (i.e. acquire all necessary
locks before modifying anything), or may choose to acquire a
table-level lock, to ensure (in either case) that their historic
isolation behavior is preserved.
Therefore applications that make heavy use of compound-write
operations might see less benefit from {write_concurrency, true}.

thanks,

Chris

    "{write_concurrency,bool()} Performance tuning. Default is false,
which means that the table is optimized towards concurrent read
access. An operation that mutates (writes to) the table will obtain
exclusive access, blocking any concurrent access of the same table
until finished. If set to true, the table is optimized towards
concurrent write access. Different parts of the same table can be
mutated (and read) by concurrent processes. This is achieve to some
degree at the expense of single access and concurrent reader
performance. Table typ ordered_set is not affected by this option in
current implementation."


On Wed, Jul 1, 2009 at 7:12 AM, Chris Newcombe<chris.newcombe@...> wrote:

>> I will make that guarantee more explicit in the documentation for next
>> release.
>
> Excellent -- thankyou very much!
>
> Chris
>
>
> On Wed, Jul 1, 2009 at 7:02 AM, Sverker
> Eriksson<sverker@...> wrote:
>> Ulf Wiger wrote:
>>>
>>> Chris Newcombe wrote:
>>>>
>>>> Please could the isolation behavior be officially documented for
>>>> ets:insert, ets:insert_new, and ets:delete_all_objects?
>>>
>>> Granted, you're asking for official documentation, and I
>>> agree that this is important.
>>>
>> As Ulf mentioned; insert, insert_new and delete_all_objects are all both
>> atomic and isolated.
>>
>> I can't see that we ever would want (and dare) to break that.
>> I will make that guarantee more explicit in the documentation for next
>> release.
>>
>> /Sverker, Erlang/OTP
>>
>>
>>
>>
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Sverker Eriksson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris Newcombe wrote:

> I just noticed that the new {write_concurrency, true} option says that
> write-locks might no-longer be taken at the table level.
>
>     "Different parts of the same table can be mutated (and read) by
> concurrent processes."
>
> (full text below)
>
> It does not say which write/read APIs are allowed to be concurrent.
>
>  
The idea with write_concurrency was that it should be pure performance
tuning
and not change any guarentees about API semantics.

> So there's the usual natural tension between clean API semantics for
> compound write operations, and increased concurrency.  e.g. Some
> applications might want atomicity, but might care more about increased
> concurrency than full isolation.  Other applications (like my current
> one) might really need strong isolation.
>
> But I guess that backwards-compatibility reasons will dominate your
> decision (quite understandably).  Given the historic implicit behavior
> (strong isolation) for delete_all_objects, insert, and insert_new, it
> would be dangerous to change them now.   Also, strong isolation
> follows the principle of least surprise.
>
>  
True, backward-compatibility was the main reason for deciding now about
making
the atomic and isolated semantics of insert, insert_new and
delete_all_objects
to be guaranteed in the docs.
The introduction of write_concurrency was however about
backward-compatibility
with respect only to performance and not semantics.

> It would be great if the updated documentation for the APIs
> specifically described the isolation semantics when
> {write_concurrency,true} is used.
>  
> And vice-versa too; e.g. it would good if the documentation for
> write_concurrency mentioned that compound-write operations will either
> acquire multiple fine-grain write-locks (i.e. acquire all necessary
> locks before modifying anything), or may choose to acquire a
> table-level lock, to ensure (in either case) that their historic
> isolation behavior is preserved.
> Therefore applications that make heavy use of compound-write
> operations might see less benefit from {write_concurrency, true}.
>
>  
Don't you think it's enough if stated about {write_concurrency,bool()}
that is does
not break any semantic promises about atomicy and isolation. Maybe note
that operations
that makes such promises will gain less (or nothing) from
{write_concurrency,true}.
I don't think we should need to describe the current internal locking
strategy.

Also, any guarantees about atomicy and isolation only applies to the data
that the function is operating on.

/Sverker, Erlang/OTP



________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Chris Newcombe-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>Don't you think it's enough if stated about {write_concurrency,bool()} that is does
not break any semantic promises about atomicy and isolation. Maybe
note that operations
that makes such promises will gain less (or nothing) from
{write_concurrency,true}.

Yes, that would be fine IMO.


> I don't think we should need to describe the current internal locking
> strategy.

Sorry, I wasn't more clear.   I wasn't advocating exposing/documenting
implementation details -- I realize that you need as much
implementation freedom as possible.
I was trying to show that some examples of potential implementation
strategies might help users understand what is not guaranteed.
But it was a bad suggestion -- it would be better to simply refer
indirectly to the API semantics documented elsewhere, as you say.


> Also, any guarantees about atomicy and isolation only applies to the data
> that the function is operating on.

Yes.  IMO the API semantics should describe this.  However, I think
you need more detail.
e.g. Atomic+isolated operations on multiple objects are really
emulating simple transactions.  What kind of isolation guarantee is
offered?
   - 'serializable' isolation would




On Thu, Jul 2, 2009 at 3:21 AM, Sverker
Eriksson<sverker@...> wrote:

> Chris Newcombe wrote:
>>
>> I just noticed that the new {write_concurrency, true} option says that
>> write-locks might no-longer be taken at the table level.
>>
>>    "Different parts of the same table can be mutated (and read) by
>> concurrent processes."
>>
>> (full text below)
>>
>> It does not say which write/read APIs are allowed to be concurrent.
>>
>>
>
> The idea with write_concurrency was that it should be pure performance
> tuning
> and not change any guarentees about API semantics.
>
>> So there's the usual natural tension between clean API semantics for
>> compound write operations, and increased concurrency.  e.g. Some
>> applications might want atomicity, but might care more about increased
>> concurrency than full isolation.  Other applications (like my current
>> one) might really need strong isolation.
>>
>> But I guess that backwards-compatibility reasons will dominate your
>> decision (quite understandably).  Given the historic implicit behavior
>> (strong isolation) for delete_all_objects, insert, and insert_new, it
>> would be dangerous to change them now.   Also, strong isolation
>> follows the principle of least surprise.
>>
>>
>
> True, backward-compatibility was the main reason for deciding now about
> making
> the atomic and isolated semantics of insert, insert_new and
> delete_all_objects
> to be guaranteed in the docs.
> The introduction of write_concurrency was however about
> backward-compatibility
> with respect only to performance and not semantics.
>
>> It would be great if the updated documentation for the APIs
>> specifically described the isolation semantics when
>> {write_concurrency,true} is used.
>>  And vice-versa too; e.g. it would good if the documentation for
>> write_concurrency mentioned that compound-write operations will either
>> acquire multiple fine-grain write-locks (i.e. acquire all necessary
>> locks before modifying anything), or may choose to acquire a
>> table-level lock, to ensure (in either case) that their historic
>> isolation behavior is preserved.
>> Therefore applications that make heavy use of compound-write
>> operations might see less benefit from {write_concurrency, true}.
>>
>>
>
> Don't you think it's enough if stated about {write_concurrency,bool()} that
> is does
> not break any semantic promises about atomicy and isolation. Maybe note that
> operations
> that makes such promises will gain less (or nothing) from
> {write_concurrency,true}.
> I don't think we should need to describe the current internal locking
> strategy.
>
> Also, any guarantees about atomicy and isolation only applies to the data
> that the function is operating on.
>
> /Sverker, Erlang/OTP
>
>
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org


Re: Is ets:insert/2 (with multiple objects) isolated with respect to concurrent readers?

by Chris Newcombe-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Very sorry -- I somehow managed to get gmail to send an incomplete
message unintentionally

The unfinished part was:,

> e.g. Atomic+isolated operations on multiple objects are really
> emulating simple transactions.  What kind of isolation guarantee is
> offered?
>   - 'serializable' isolation would

I was about to say that to achieve full 'serializable' isolation may
require locking keys/objects that are currently absent (but which will
be inserted by the current insert operation).
(This is analogous to the 'phantom rows' problem in RDBMs).
i.e. It requires two-pass locking; accumulate all necessary locks,
including on the logical slots for any keys that are to be created.
If you don't do this then concurrent multi-object writes may appear to
be interleaved (not serializable) after the fact.
(Of course, acquiring a table-level lock achieves this but is
undesirable in other ways.)

There are of course other standard serialization levels to choose
from, e.g. 'repeatable-read'.

So whichever type of isolation you choose to support for ETS, it may
help users if the documentation describes it in standard vocabulary
(assuming one of the standard levels matches the semantics of ETS.)

Apologies again for the earlier incomplete message.

regards,

Chris



On Thu, Jul 2, 2009 at 8:02 AM, Chris Newcombe<chris.newcombe@...> wrote:

>>>Don't you think it's enough if stated about {write_concurrency,bool()} that is does
> not break any semantic promises about atomicy and isolation. Maybe
> note that operations
> that makes such promises will gain less (or nothing) from
> {write_concurrency,true}.
>
> Yes, that would be fine IMO.
>
>
>> I don't think we should need to describe the current internal locking
>> strategy.
>
> Sorry, I wasn't more clear.   I wasn't advocating exposing/documenting
> implementation details -- I realize that you need as much
> implementation freedom as possible.
> I was trying to show that some examples of potential implementation
> strategies might help users understand what is not guaranteed.
> But it was a bad suggestion -- it would be better to simply refer
> indirectly to the API semantics documented elsewhere, as you say.
>
>
>> Also, any guarantees about atomicy and isolation only applies to the data
>> that the function is operating on.
>
> Yes.  IMO the API semantics should describe this.  However, I think
> you need more detail.
> e.g. Atomic+isolated operations on multiple objects are really
> emulating simple transactions.  What kind of isolation guarantee is
> offered?
>   - 'serializable' isolation would
>
>
>
>
> On Thu, Jul 2, 2009 at 3:21 AM, Sverker
> Eriksson<sverker@...> wrote:
>> Chris Newcombe wrote:
>>>
>>> I just noticed that the new {write_concurrency, true} option says that
>>> write-locks might no-longer be taken at the table level.
>>>
>>>    "Different parts of the same table can be mutated (and read) by
>>> concurrent processes."
>>>
>>> (full text below)
>>>
>>> It does not say which write/read APIs are allowed to be concurrent.
>>>
>>>
>>
>> The idea with write_concurrency was that it should be pure performance
>> tuning
>> and not change any guarentees about API semantics.
>>
>>> So there's the usual natural tension between clean API semantics for
>>> compound write operations, and increased concurrency.  e.g. Some
>>> applications might want atomicity, but might care more about increased
>>> concurrency than full isolation.  Other applications (like my current
>>> one) might really need strong isolation.
>>>
>>> But I guess that backwards-compatibility reasons will dominate your
>>> decision (quite understandably).  Given the historic implicit behavior
>>> (strong isolation) for delete_all_objects, insert, and insert_new, it
>>> would be dangerous to change them now.   Also, strong isolation
>>> follows the principle of least surprise.
>>>
>>>
>>
>> True, backward-compatibility was the main reason for deciding now about
>> making
>> the atomic and isolated semantics of insert, insert_new and
>> delete_all_objects
>> to be guaranteed in the docs.
>> The introduction of write_concurrency was however about
>> backward-compatibility
>> with respect only to performance and not semantics.
>>
>>> It would be great if the updated documentation for the APIs
>>> specifically described the isolation semantics when
>>> {write_concurrency,true} is used.
>>>  And vice-versa too; e.g. it would good if the documentation for
>>> write_concurrency mentioned that compound-write operations will either
>>> acquire multiple fine-grain write-locks (i.e. acquire all necessary
>>> locks before modifying anything), or may choose to acquire a
>>> table-level lock, to ensure (in either case) that their historic
>>> isolation behavior is preserved.
>>> Therefore applications that make heavy use of compound-write
>>> operations might see less benefit from {write_concurrency, true}.
>>>
>>>
>>
>> Don't you think it's enough if stated about {write_concurrency,bool()} that
>> is does
>> not break any semantic promises about atomicy and isolation. Maybe note that
>> operations
>> that makes such promises will gain less (or nothing) from
>> {write_concurrency,true}.
>> I don't think we should need to describe the current internal locking
>> strategy.
>>
>> Also, any guarantees about atomicy and isolation only applies to the data
>> that the function is operating on.
>>
>> /Sverker, Erlang/OTP
>>
>>
>>
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org