|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Hadoop 0.19.1Folks,
Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 branch has issues and a 0.19.1 release is needed. Quality issues in the changes made for the file append feature have prevented some from deploying Hadoop 0.19. One of these changes (sync) has now been "fixed" by reducing its semantics in Hadoop 0.18.3 (HADOOP-4997). This was necessary to stabilize the 0.18 branch. I would like to propose that we apply this same "fix" to sync in 0.19.1 and 0.20.0. Since append requires the full semantics of sync, I propose we also disable append (perhaps throw UnsupportedOperationException from API?). Yes, this would unfortunately be an incompatible change between 0.19.0 and 0.19.1. We can then take the time needed to fix append properly in 0.21.0. I will call a vote for 0.19.1 and 0.20.0 when blockers are fixed. Nigel |
|
|
Re: Hadoop 0.19.1Nigel Daley wrote:
> Folks, > > Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 > branch has issues and a 0.19.1 release is needed. > > Quality issues in the changes made for the file append feature have > prevented some from deploying Hadoop 0.19. One of these changes (sync) > has now been "fixed" by reducing its semantics in Hadoop 0.18.3 > (HADOOP-4997). This was necessary to stabilize the 0.18 branch. > > I would like to propose that we apply this same "fix" to sync in 0.19.1 > and 0.20.0. Since append requires the full semantics of sync, I propose > we also disable append (perhaps throw UnsupportedOperationException from > API?). Yes, this would unfortunately be an incompatible change between > 0.19.0 and 0.19.1. We can then take the time needed to fix append > properly in 0.21.0. I can see some people being unhappy about this, but giving them a choice between having the filesystem work or not, hopefully they will see the merits of the change. And I am +1 to taking time to fix things; fast fixes often create new problems |
|
|
Re: Hadoop 0.19.1The below proposes pushing a working append out 6 months or so.
Can we have pointers as to what is meant by 'quality issues' in the below so we can make a more informed vote or is it just the hdfs issue posted against 0.19.1? Is it there also that we would find what is involved making append work in 0.19.1? Thanks, St.Ack On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@...> wrote: > Nigel Daley wrote: > >> Folks, >> >> Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 branch >> has issues and a 0.19.1 release is needed. >> >> Quality issues in the changes made for the file append feature have >> prevented some from deploying Hadoop 0.19. One of these changes (sync) has >> now been "fixed" by reducing its semantics in Hadoop 0.18.3 (HADOOP-4997). >> This was necessary to stabilize the 0.18 branch. >> >> I would like to propose that we apply this same "fix" to sync in 0.19.1 >> and 0.20.0. Since append requires the full semantics of sync, I propose we >> also disable append (perhaps throw UnsupportedOperationException from API?). >> Yes, this would unfortunately be an incompatible change between 0.19.0 and >> 0.19.1. We can then take the time needed to fix append properly in 0.21.0. >> > > I can see some people being unhappy about this, but giving them a choice > between having the filesystem work or not, hopefully they will see the > merits of the change. And I am +1 to taking time to fix things; fast fixes > often create new problems > |
|
|
RE: Hadoop 0.19.1Are proposing disabling both append and sync?
While HBase does not use the full semantics of append, we do depend on sync. There is a patch for trunk (HADOOP-4379) that we are testing on the 0.19 branch, 0.20 branch and trunk. -1 if sync does not work until 0.21.1 recovery from server crashes depends entirely on sync working. --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) > -----Original Message----- > From: Nigel Daley [mailto:ndaley@...] > Sent: Thursday, January 29, 2009 4:16 PM > To: core-dev@... > Subject: Hadoop 0.19.1 > > Folks, > > Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 > branch has issues and a 0.19.1 release is needed. > > Quality issues in the changes made for the file append feature have > prevented some from deploying Hadoop 0.19. One of these changes > (sync) has now been "fixed" by reducing its semantics in Hadoop 0.18.3 > (HADOOP-4997). This was necessary to stabilize the 0.18 branch. > > I would like to propose that we apply this same "fix" to sync in > 0.19.1 and 0.20.0. Since append requires the full semantics of sync, > I propose we also disable append (perhaps throw > UnsupportedOperationException from API?). Yes, this would > unfortunately be an incompatible change between 0.19.0 and 0.19.1. We > can then take the time needed to fix append properly in 0.21.0. > > I will call a vote for 0.19.1 and 0.20.0 when blockers are fixed. > > Nigel |
|
|
Re: Hadoop 0.19.1stack wrote:
> The below proposes pushing a working append out 6 months or so. > > Can we have pointers as to what is meant by 'quality issues' in the below so > we can make a more informed vote or is it just the hdfs issue posted against > 0.19.1? > Is it there also that we would find what is involved making append > work in 0.19.1? If one knew what is enough to fix properly, it would be easy. But over last couple of months, there have been many fixes (some of these jiras are listed in one Konstantins HADOOP-4663). The discussions are still bringing up more cases where the implementation or algorithm should change. But these are improvements for sure. But doubt if I would be ready to call it is 'completely fixed'. It needs time and a lot of testing in large clusters. Personally I am +1 for getting these into 0.19 branch. Most importantly even clusters and application not using append or sync were also affected, thats why extra caution. my 2 cents. hope this does not digress too much from the main topic. Raghu. > Thanks, > St.Ack > > > > On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@...> wrote: > >> Nigel Daley wrote: >> >>> Folks, >>> >>> Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 branch >>> has issues and a 0.19.1 release is needed. >>> >>> Quality issues in the changes made for the file append feature have >>> prevented some from deploying Hadoop 0.19. One of these changes (sync) has >>> now been "fixed" by reducing its semantics in Hadoop 0.18.3 (HADOOP-4997). >>> This was necessary to stabilize the 0.18 branch. >>> >>> I would like to propose that we apply this same "fix" to sync in 0.19.1 >>> and 0.20.0. Since append requires the full semantics of sync, I propose we >>> also disable append (perhaps throw UnsupportedOperationException from API?). >>> Yes, this would unfortunately be an incompatible change between 0.19.0 and >>> 0.19.1. We can then take the time needed to fix append properly in 0.21.0. >>> >> I can see some people being unhappy about this, but giving them a choice >> between having the filesystem work or not, hopefully they will see the >> merits of the change. And I am +1 to taking time to fix things; fast fixes >> often create new problems >> > |
|
|
Re: Hadoop 0.19.1Raghu Angadi wrote:
>> Is it there also that we would find what is involved making append >> work in 0.19.1? > > If one knew what is enough to fix properly, it would be easy. But over > last couple of months, there have been many fixes (some of these jiras > are listed in one Konstantins HADOOP-4663). Konstantin's comment I referred to (it was also linked from HADOOP-4663, but harder to find). https://issues.apache.org/jira/browse/HADOOP-5027#action_12668136 Raghu. > The discussions are still > bringing up more cases where the implementation or algorithm should > change. But these are improvements for sure. But doubt if I would be > ready to call it is 'completely fixed'. It needs time and a lot of > testing in large clusters. > > Personally I am +1 for getting these into 0.19 branch. Most importantly > even clusters and application not using append or sync were also > affected, thats why extra caution. > > my 2 cents. hope this does not digress too much from the main topic. > > Raghu. > >> Thanks, >> St.Ack >> >> >> >> On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@...> >> wrote: >> >>> Nigel Daley wrote: >>> >>>> Folks, >>>> >>>> Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 >>>> branch >>>> has issues and a 0.19.1 release is needed. >>>> >>>> Quality issues in the changes made for the file append feature have >>>> prevented some from deploying Hadoop 0.19. One of these changes >>>> (sync) has >>>> now been "fixed" by reducing its semantics in Hadoop 0.18.3 >>>> (HADOOP-4997). >>>> This was necessary to stabilize the 0.18 branch. >>>> >>>> I would like to propose that we apply this same "fix" to sync in 0.19.1 >>>> and 0.20.0. Since append requires the full semantics of sync, I >>>> propose we >>>> also disable append (perhaps throw UnsupportedOperationException >>>> from API?). >>>> Yes, this would unfortunately be an incompatible change between >>>> 0.19.0 and >>>> 0.19.1. We can then take the time needed to fix append properly in >>>> 0.21.0. >>>> >>> I can see some people being unhappy about this, but giving them a choice >>> between having the filesystem work or not, hopefully they will see the >>> merits of the change. And I am +1 to taking time to fix things; fast >>> fixes >>> often create new problems >>> >> > |
|
|
Re: Hadoop 0.19.1Raghu, thanks for providing the link.
Jim> Are proposing disabling both append and sync? Jim, this statement is probably too strong. sync is not disabled per se, you will be able to use it, although its full semantic is not guaranteed in some failure scenarios. See more here https://issues.apache.org/jira/browse/HADOOP-4663#action_12661802 We will have to really disable append (throw UnsupportedOperationException) because otherwise current solution may lead to a loss of previously existed data. I agree with Nigel that there is a need for an urgent 0.19.1 release because a lot of bugs were fixed since 0.18.2 and 0.19.0. The system is now stable on our clusters with 0.18.3 same fixes went into 0.19.1. If we try to rush fixing the bugs for append (listed in my comment) we risk to destabilize the system again, and this is my main concern. Formally we should not release until a feature is fixed, but I think it is better to let people use a stable release with limited functionality rather than having full functionality with a risk of data loss. Hope this will work for everybody. --Konstantin Raghu Angadi wrote: > Raghu Angadi wrote: >>> Is it there also that we would find what is involved making append >>> work in 0.19.1? >> >> If one knew what is enough to fix properly, it would be easy. But over >> last couple of months, there have been many fixes (some of these jiras >> are listed in one Konstantins HADOOP-4663). > > Konstantin's comment I referred to (it was also linked from HADOOP-4663, > but harder to find). > > https://issues.apache.org/jira/browse/HADOOP-5027#action_12668136 > > Raghu. > >> The discussions are still bringing up more cases where the >> implementation or algorithm should change. But these are improvements >> for sure. But doubt if I would be ready to call it is 'completely >> fixed'. It needs time and a lot of testing in large clusters. >> >> Personally I am +1 for getting these into 0.19 branch. Most >> importantly even clusters and application not using append or sync >> were also affected, thats why extra caution. >> >> my 2 cents. hope this does not digress too much from the main topic. >> >> Raghu. >> >>> Thanks, >>> St.Ack >>> >>> >>> >>> On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@...> >>> wrote: >>> >>>> Nigel Daley wrote: >>>> >>>>> Folks, >>>>> >>>>> Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 >>>>> branch >>>>> has issues and a 0.19.1 release is needed. >>>>> >>>>> Quality issues in the changes made for the file append feature have >>>>> prevented some from deploying Hadoop 0.19. One of these changes >>>>> (sync) has >>>>> now been "fixed" by reducing its semantics in Hadoop 0.18.3 >>>>> (HADOOP-4997). >>>>> This was necessary to stabilize the 0.18 branch. >>>>> >>>>> I would like to propose that we apply this same "fix" to sync in >>>>> 0.19.1 >>>>> and 0.20.0. Since append requires the full semantics of sync, I >>>>> propose we >>>>> also disable append (perhaps throw UnsupportedOperationException >>>>> from API?). >>>>> Yes, this would unfortunately be an incompatible change between >>>>> 0.19.0 and >>>>> 0.19.1. We can then take the time needed to fix append properly in >>>>> 0.21.0. >>>>> >>>> I can see some people being unhappy about this, but giving them a >>>> choice >>>> between having the filesystem work or not, hopefully they will see the >>>> merits of the change. And I am +1 to taking time to fix things; fast >>>> fixes >>>> often create new problems >>>> >>> >> > > |
|
|
Re: Hadoop 0.19.1Hi Konstantin,
We are also heavy users of fsync(). I've been working with Dhruba on Hadoop-4379 <https://issues.apache.org/jira/browse/HADOOP-4379>. His most recent patch appears to work for Jim's situation. However, there are still a couple of problems that need to be resolved before we can start heavily using it: 1. After application crash/restart, the file length (as returned by getFileStatus) is incorrect since the length at the namenode is stale. Ideally getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode. If that's not feasible, there should be some other way to obtain the actual file length. 2. When an application comes up after a crash, it seems to hang for about 60 seconds waiting for lease recovery. Our database cannot go offline for a whole minute doing nothing. In our case, when we come up after a crash and try to re-open the log file, we know for certain that we are the exclusive owner of that file. There should be a way to tell the system to forcibly take over the lease and recover immediately What do you recommend? Is there anyway we could get these two issues fixed for 0.19.1, or should I file issues for them and get them on the schedule for 0.19.2? - Doug On Mon, Feb 2, 2009 at 11:59 AM, Konstantin Shvachko <shv@...>wrote: > Raghu, thanks for providing the link. > > Jim> Are proposing disabling both append and sync? > > Jim, this statement is probably too strong. > sync is not disabled per se, you will be able to use it, although > its full semantic is not guaranteed in some failure scenarios. > See more here > https://issues.apache.org/jira/browse/HADOOP-4663#action_12661802 > > We will have to really disable append (throw UnsupportedOperationException) > because otherwise current solution may lead to a loss of previously existed > data. > > I agree with Nigel that there is a need for an urgent 0.19.1 release > because a lot of bugs were fixed since 0.18.2 and 0.19.0. > The system is now stable on our clusters with 0.18.3 same fixes went into > 0.19.1. > > If we try to rush fixing the bugs for append (listed in my comment) we > risk to destabilize the system again, and this is my main concern. > > Formally we should not release until a feature is fixed, but I think it > is better to let people use a stable release with limited functionality > rather than having full functionality with a risk of data loss. > > Hope this will work for everybody. > --Konstantin > > > Raghu Angadi wrote: > >> Raghu Angadi wrote: >> >>> Is it there also that we would find what is involved making append >>>> work in 0.19.1? >>>> >>> >>> If one knew what is enough to fix properly, it would be easy. But over >>> last couple of months, there have been many fixes (some of these jiras are >>> listed in one Konstantins HADOOP-4663). >>> >> >> Konstantin's comment I referred to (it was also linked from HADOOP-4663, >> but harder to find). >> >> https://issues.apache.org/jira/browse/HADOOP-5027#action_12668136 >> >> Raghu. >> >> The discussions are still bringing up more cases where the implementation >>> or algorithm should change. But these are improvements for sure. But doubt >>> if I would be ready to call it is 'completely fixed'. It needs time and a >>> lot of testing in large clusters. >>> >>> Personally I am +1 for getting these into 0.19 branch. Most importantly >>> even clusters and application not using append or sync were also affected, >>> thats why extra caution. >>> >>> my 2 cents. hope this does not digress too much from the main topic. >>> >>> Raghu. >>> >>> Thanks, >>>> St.Ack >>>> >>>> >>>> >>>> On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@...> >>>> wrote: >>>> >>>> Nigel Daley wrote: >>>>> >>>>> Folks, >>>>>> >>>>>> Some Hadoop deployments have upgraded to 0.19.0. Clearly, the 0.19 >>>>>> branch >>>>>> has issues and a 0.19.1 release is needed. >>>>>> >>>>>> Quality issues in the changes made for the file append feature have >>>>>> prevented some from deploying Hadoop 0.19. One of these changes >>>>>> (sync) has >>>>>> now been "fixed" by reducing its semantics in Hadoop 0.18.3 >>>>>> (HADOOP-4997). >>>>>> This was necessary to stabilize the 0.18 branch. >>>>>> >>>>>> I would like to propose that we apply this same "fix" to sync in >>>>>> 0.19.1 >>>>>> and 0.20.0. Since append requires the full semantics of sync, I >>>>>> propose we >>>>>> also disable append (perhaps throw UnsupportedOperationException from >>>>>> API?). >>>>>> Yes, this would unfortunately be an incompatible change between >>>>>> 0.19.0 and >>>>>> 0.19.1. We can then take the time needed to fix append properly in >>>>>> 0.21.0. >>>>>> >>>>>> I can see some people being unhappy about this, but giving them a >>>>> choice >>>>> between having the filesystem work or not, hopefully they will see the >>>>> merits of the change. And I am +1 to taking time to fix things; fast >>>>> fixes >>>>> often create new problems >>>>> >>>>> >>>> >>> >> >> |
|
|
Re: Hadoop 0.19.1On Feb 2, 2009, at 12:51 PM, Doug Judd wrote:
> What do you recommend? Is there anyway we could get these two > issues fixed > for 0.19.1, or should I file issues for them and get them on the > schedule > for 0.19.2? Given the outstanding problems and general level of uncertainty, I'd favor releasing a 0.19.1 with the equivalent of the 0.18.3 disable on fsync and append. Let's get them fixed in 0.20 first and then we can debate whether the rewards of pushing them back into an 0.19.2 would make sense. I'm pretty uncomfortable at the moment with how the entire functional complex seems to cause a continuous stream of problems. -- Owen |
|
|
RE: Hadoop 0.19.1HBase could live with sync() in 0.20.0 (and preferably 0.19.2),
but 0.20.0 is an absolute blocker. --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) > -----Original Message----- > From: Owen O'Malley [mailto:owen.omalley@...] On Behalf Of Owen > O'Malley > Sent: Monday, February 02, 2009 1:51 PM > To: core-dev@... > Subject: Re: Hadoop 0.19.1 > > On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: > > > What do you recommend? Is there anyway we could get these two > > issues fixed > > for 0.19.1, or should I file issues for them and get them on the > > schedule > > for 0.19.2? > > Given the outstanding problems and general level of uncertainty, I'd > favor releasing a 0.19.1 with the equivalent of the 0.18.3 disable on > fsync and append. Let's get them fixed in 0.20 first and then we can > debate whether the rewards of pushing them back into an 0.19.2 would > make sense. I'm pretty uncomfortable at the moment with how the entire > functional complex seems to cause a continuous stream of problems. > > -- Owen |
|
|
Re: Hadoop 0.19.1Sounds good. I would much rather wait and have fsync() done correctly in
0.20 than get some sort of hacked version in 0.19. I'll create a couple of issues and mark them for 0.20 Thanks. - Doug On Mon, Feb 2, 2009 at 1:51 PM, Owen O'Malley <omalley@...> wrote: > On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: > > What do you recommend? Is there anyway we could get these two issues >> fixed >> for 0.19.1, or should I file issues for them and get them on the schedule >> for 0.19.2? >> > > Given the outstanding problems and general level of uncertainty, I'd favor > releasing a 0.19.1 with the equivalent of the 0.18.3 disable on fsync and > append. Let's get them fixed in 0.20 first and then we can debate whether > the rewards of pushing them back into an 0.19.2 would make sense. I'm pretty > uncomfortable at the moment with how the entire functional complex seems to > cause a continuous stream of problems. > > -- Owen > |
|
|
Re: Hadoop 0.19.1 > What do you recommend?
In general. There may be people/organizations, which will not compromise on the reduced functionality in favor of the stability, this is understandable. I would propose to create a separate (unofficial experimental) branch, which would track changes like HADOOP-4379. The branch may later either die when the main stream is fixed or be merged with the trunk if the changes proved to be stable. >1. the file length (as returned by getFileStatus) is incorrect May be the following work around will be useful. If you read from a file you always try to read more data than the length reported by the name-node. How much more? The size of one block would be enough, or even to the next (ceiling) block boundary. >2. When an application comes up after a crash, it seems to hang for about 60 Don't have enough context on that, sorry. Thanks, --Konstantin Doug Judd wrote: > Sounds good. I would much rather wait and have fsync() done correctly in > 0.20 than get some sort of hacked version in 0.19. I'll create a couple of > issues and mark them for 0.20 Thanks. > > - Doug > > On Mon, Feb 2, 2009 at 1:51 PM, Owen O'Malley <omalley@...> wrote: > >> On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: >> >> What do you recommend? Is there anyway we could get these two issues >>> fixed >>> for 0.19.1, or should I file issues for them and get them on the schedule >>> for 0.19.2? >>> >> Given the outstanding problems and general level of uncertainty, I'd favor >> releasing a 0.19.1 with the equivalent of the 0.18.3 disable on fsync and >> append. Let's get them fixed in 0.20 first and then we can debate whether >> the rewards of pushing them back into an 0.19.2 would make sense. I'm pretty >> uncomfortable at the moment with how the entire functional complex seems to >> cause a continuous stream of problems. >> >> -- Owen >> > |
|
|
Re: Hadoop 0.19.1Comments inline ...
On Mon, Feb 2, 2009 at 4:23 PM, Konstantin Shvachko <shv@...>wrote: > > What do you recommend? > > In general. There may be people/organizations, which will not compromise > on the reduced functionality in favor of the stability, this is > understandable. > I would propose to create a separate (unofficial experimental) branch, > which > would track changes like HADOOP-4379. The branch may later either die when > the > main stream is fixed or be merged with the trunk if the changes proved to > be stable. Sure, that sounds reasonable. One thing I would caution against is spending a lot of time doing incremental patchwork on something that needs a ground-up overhaul. I would much rather wait a couple of months longer and get software that is based on a well thought out design that is fundamentally sound. Ultimately that will be the fastest path to stability. > > >1. the file length (as returned by getFileStatus) is incorrect > May be the following work around will be useful. > If you read from a file you always try to read more data than the length > reported > by the name-node. How much more? The size of one block would be enough, or > even to the next (ceiling) block boundary. I could certainly implement a workaround, however, from an API standpoint, the filesystem (IMHO) should always give you a way to obtain the real length of the file. The semantics of the current getFileStatus() make it difficult to reason about the state of your filesystem. It basically returns a "possibly stale" version of the length. I would prefer to wait for an implementation that gives an accurate answer and spend my time and energy helping to test that one, rather than spending a bunch of time implementing a workaround for the current version. >2. When an application comes up after a crash, it seems to hang for about > 60 > > Don't have enough context on that, sorry. I spoke too soon on this. The reason that HDFS was hanging on lease recovery was because I was opening the file in append mode to force lease recovery (at Dhruba's suggestion) so that it would update the NameNode with the proper length. If I had a method of obtaining the accurate length of the file, I wouldn't need to do this. Hence, I didn't bother filing an issue on this. - Doug > Thanks, > --Konstantin > > Doug Judd wrote: > >> Sounds good. I would much rather wait and have fsync() done correctly in >> 0.20 than get some sort of hacked version in 0.19. I'll create a couple >> of >> issues and mark them for 0.20 Thanks. >> >> - Doug >> >> On Mon, Feb 2, 2009 at 1:51 PM, Owen O'Malley <omalley@...> wrote: >> >> On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: >>> >>> What do you recommend? Is there anyway we could get these two issues >>> >>>> fixed >>>> for 0.19.1, or should I file issues for them and get them on the >>>> schedule >>>> for 0.19.2? >>>> >>>> Given the outstanding problems and general level of uncertainty, I'd >>> favor >>> releasing a 0.19.1 with the equivalent of the 0.18.3 disable on fsync and >>> append. Let's get them fixed in 0.20 first and then we can debate whether >>> the rewards of pushing them back into an 0.19.2 would make sense. I'm >>> pretty >>> uncomfortable at the moment with how the entire functional complex seems >>> to >>> cause a continuous stream of problems. >>> >>> -- Owen >>> >>> >> |
|
|
Re: Hadoop 0.19.1On Feb 2, 2009, at 1:51 PM, Owen O'Malley wrote: > On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: > > > What do you recommend? Is there anyway we could get these two > > issues fixed > > for 0.19.1, or should I file issues for them and get them on the > > schedule > > for 0.19.2? > > Given the outstanding problems and general level of uncertainty, I'd > favor releasing a 0.19.1 with the equivalent of the 0.18.3 disable on > fsync and append. Let's get them fixed in 0.20 first and then we can > debate whether the rewards of pushing them back into an 0.19.2 would > make sense. I'm pretty uncomfortable at the moment with how the entire > functional complex seems to cause a continuous stream of problems. > > -- Owen > Append/sync code has been very problematic and bug fixes have not always fixed things completely. Hence getting a quick release of 0.19.1 that deal with the data loss (ie 0.18.3 functionality) is important. sanjay |
|
|
Re: Hadoop 0.19.1On Feb 2, 2009, at 6:18 PM, Doug Judd wrote: > Comments inline ... > > On Mon, Feb 2, 2009 at 4:23 PM, Konstantin Shvachko <shv@yahoo- > inc.com>wrote: > > > > What do you recommend? > > > > In general. There may be people/organizations, which will not > compromise > > on the reduced functionality in favor of the stability, this is > > understandable. > > I would propose to create a separate (unofficial experimental) > branch, > > which > > would track changes like HADOOP-4379. The branch may later either > die when > > the > > main stream is fixed or be merged with the trunk if the changes > proved to > > be stable. > > > Sure, that sounds reasonable. One thing I would caution against is > spending > a lot of time doing incremental patchwork on something that needs a > ground-up overhaul. I would much rather wait a couple of months > longer and > get software that is based on a well thought out design that is > fundamentally sound. Ultimately that will be the fastest path to > stability. > Agree. sanjay > |
|
|
Re: Hadoop 0.19.1On Feb 2, 2009, at 4:23 PM, Konstantin Shvachko wrote: > > > What do you recommend? > > In general. There may be people/organizations, which will not > compromise > on the reduced functionality in favor of the stability, this is > understandable. > I would propose to create a separate (unofficial experimental) > branch, which > would track changes like HADOOP-4379. The branch may later either > die when the > main stream is fixed or be merged with the trunk if the changes > proved to be stable. > This is very a interesting suggestion. Many in the team have come to the conclusion that complex projects like append should be done on a separate branch in the first place and integrated with trunk when the project is stable. sanjay > > > >1. the file length (as returned by getFileStatus) is incorrect > > May be the following work around will be useful. > If you read from a file you always try to read more data than the > length reported > by the name-node. How much more? The size of one block would be > enough, or > even to the next (ceiling) block boundary. > > >2. When an application comes up after a crash, it seems to hang > for about 60 > > Don't have enough context on that, sorry. > > Thanks, > --Konstantin > > Doug Judd wrote: > > Sounds good. I would much rather wait and have fsync() done > correctly in > > 0.20 than get some sort of hacked version in 0.19. I'll create a > couple of > > issues and mark them for 0.20 Thanks. > > > > - Doug > > > > On Mon, Feb 2, 2009 at 1:51 PM, Owen O'Malley <omalley@...> > wrote: > > > >> On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: > >> > >> What do you recommend? Is there anyway we could get these two > issues > >>> fixed > >>> for 0.19.1, or should I file issues for them and get them on the > schedule > >>> for 0.19.2? > >>> > >> Given the outstanding problems and general level of uncertainty, > I'd favor > >> releasing a 0.19.1 with the equivalent of the 0.18.3 disable on > fsync and > >> append. Let's get them fixed in 0.20 first and then we can debate > whether > >> the rewards of pushing them back into an 0.19.2 would make sense. > I'm pretty > >> uncomfortable at the moment with how the entire functional > complex seems to > >> cause a continuous stream of problems. > >> > >> -- Owen > >> > > > |
|
|
Re: Hadoop 0.19.1Sanjay Radia wrote:
> > On Feb 2, 2009, at 4:23 PM, Konstantin Shvachko wrote: > >> >> > What do you recommend? >> >> In general. There may be people/organizations, which will not compromise >> on the reduced functionality in favor of the stability, this is >> understandable. >> I would propose to create a separate (unofficial experimental) branch, >> which >> would track changes like HADOOP-4379. The branch may later either die >> when the >> main stream is fixed or be merged with the trunk if the changes proved >> to be stable. >> > > > This is very a interesting suggestion. > Many in the team have come to the conclusion that complex projects like > append should be done on a separate branch in the first place and > integrated with trunk when the project is stable. > There's a lot to be said for branching; I'm also looking at git so I can do my service lifecycle stuff under SCM properly. but the cost of merging can be high. I'd estimate 1 morning/week is spent updating my local SVN and then seeing that everything still works. If hudson could both test the branches and test any merged branches, life would be better The other problem is incompatible branches: the more branches you have live, the higher the merge cost. That said, Git promises wonderful things, and we ought to be able to set up Apache support for git for people wanting to do their own branches -svn would still be the official SCM tool |
|
|
Re: Hadoop 0.19.1On Tue, Feb 3, 2009 at 7:02 PM, Sanjay Radia <sradia@...> wrote:
> > Many in the team have come to the conclusion that complex projects like > append should be done on a separate branch in the first place and integrated > with trunk when the project is stable. > How do we determine when a feature on the branch is 'stable'? Is there a test suite to run beyond hudson's running of unit tests? St.Ack |
|
|
Re: Hadoop 0.19.1On Thu, Jan 29, 2009 at 4:16 PM, Nigel Daley <ndaley@...> wrote:
> > I would like to propose that we apply this same "fix" to sync in 0.19.1 and > 0.20.0. Since append requires the full semantics of sync, I propose we also > disable append (perhaps throw UnsupportedOperationException from API?). > Yes, this would unfortunately be an incompatible change between 0.19.0 and > 0.19.1. We can then take the time needed to fix append properly in 0.21.0. > +1 on a 0.19.1 release with a 0.18.3-like solution removing some of the sync symantics. A stable hdfs takes precedence. Please count us hbasistas in when testing of append is needed. Applications' like hbase are plain broke without it (You may have heard us mention this fact once or twice at times in the past -- smile). Good stuff, St.Ack |
|
|
Re: Hadoop 0.19.1On Feb 4, 2009, at 3:38 AM, Steve Loughran wrote: > Sanjay Radia wrote: > > > > On Feb 2, 2009, at 4:23 PM, Konstantin Shvachko wrote: > > > >> > >> > What do you recommend? > >> > >> In general. There may be people/organizations, which will not > compromise > >> on the reduced functionality in favor of the stability, this is > >> understandable. > >> I would propose to create a separate (unofficial experimental) > branch, > >> which > >> would track changes like HADOOP-4379. The branch may later either > die > >> when the > >> main stream is fixed or be merged with the trunk if the changes > proved > >> to be stable. > >> > > > > > > This is very a interesting suggestion. > > Many in the team have come to the conclusion that complex > projects like > > append should be done on a separate branch in the first place and > > integrated with trunk when the project is stable. > > > > There's a lot to be said for branching; I'm also looking at git so I > can > do my service lifecycle stuff under SCM properly. > > but the cost of merging can be high. I'd estimate 1 morning/week is > spent updating my local SVN and then seeing that everything still > works. > If hudson could both test the branches and test any merged branches, > life would be better > I agree on the cost of merging. When a project is branched, after a while one can spend as much as 30% of cycles merging in changes. But when a system is used in production to store data we cannot afford to have users loose their data. The team at Yahoo had to scramble to recover the lost data, put in several emergency patches to deal with the append code. I am all for extending hudson testing for branches, but hudson testing, while helpful, will not be sufficient for big projects because hudson does not have a comprehensive set of tests. Each new release is tested significantly beyond the hudson tests. For me the lesson is that large complex projects should be branched. (This is how commercial software products are engineered). There will increased cost to the project team, but over all, the community will have more solid releases and the total cost to the community in delivering the techology will be smaller. sanjay > > > The other problem is incompatible branches: the more branches you have > live, the higher the merge cost. > > That said, Git promises wonderful things, and we ought to be able to > set > up Apache support for git for people wanting to do their own branches > -svn would still be the official SCM tool > |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |