summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/changelog/v0.48.1argonaut.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/ceph/doc/changelog/v0.48.1argonaut.txt')
-rw-r--r--src/ceph/doc/changelog/v0.48.1argonaut.txt1286
1 files changed, 1286 insertions, 0 deletions
diff --git a/src/ceph/doc/changelog/v0.48.1argonaut.txt b/src/ceph/doc/changelog/v0.48.1argonaut.txt
new file mode 100644
index 0000000..cdd557f
--- /dev/null
+++ b/src/ceph/doc/changelog/v0.48.1argonaut.txt
@@ -0,0 +1,1286 @@
+commit a7ad701b9bd479f20429f19e6fea7373ca6bba7c
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Aug 13 14:58:51 2012 -0700
+
+ v0.48.1argonaut
+
+commit d4849f2f8a8c213c266658467bc5f22763010bc2
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Wed Aug 1 13:22:38 2012 -0700
+
+ rgw: fix usage trim call encoding
+
+ Fixes: #2841.
+ Usage trim operation was encoding the wrong op structure (usage read).
+ Since the structures somewhat overlapped it somewhat worked, but user
+ info wasn't encoded.
+
+ Backport: argonaut
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 515952d07107d442889754ec3bd6a344fad25d58
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Wed Aug 8 15:21:53 2012 -0700
+
+ cls_rgw: fix rgw_cls_usage_log_trim_op encode/decode
+
+ It was not encoding user, adding that and reset version
+ compatibility.
+ This changes affects command interface, makes use of
+ radosgw-admin usage trim incompatible. Use of old
+ radosgw-admin usage trim should be avoided, as it may
+ remove more data than requested. In any case, upgraded
+ server code will not handle old client's trim requests.
+
+ backport: argonaut
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 2e77130d5c80220be1612b5499d422de620d2d0b
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Tue Jul 31 16:17:22 2012 -0700
+
+ rgw: expand date format support
+
+ Relaxing the date format parsing function to allow UTC
+ instead of GMT.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 14fa77d9277b5ef5d0c6683504b368773b39ccc4
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Aug 2 11:13:05 2012 -0700
+
+ rgw: complete multipart upload can handle chunked encoding
+
+ Fixes: #2878
+ We now allow complete multipart upload to use chunked encoding
+ when sending request data. With chunked encoding the HTTP_LENGTH
+ header is not required.
+
+ Backport: argonaut
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit a06f7783fbcc02e775fc36f30e422fe0f9e0ec2d
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Wed Aug 1 11:19:32 2012 -0700
+
+ rgw_xml: xml_handle_data() appends data string
+
+ Fixes: #2879.
+ xml_handle_data() appends data to the object instead of just
+ replacing it. Parsed data can arrive in pieces, specifically
+ when data is escaped.
+
+ Backport: argonaut
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit a8b224b9c4877a559ce420a2e04f19f68c8c5680
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Wed Aug 1 13:09:41 2012 -0700
+
+ rgw: ETag is unquoted in multipart upload complete
+
+ Fixes #2877.
+ Removing quotes from ETag before comparing it to what we
+ have when completing a multipart upload.
+
+ Backport: argonaut
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 22259c6efda9a5d55221fd036c757bf123796753
+Author: Josh Durgin <josh.durgin@inktank.com>
+Date: Wed Aug 8 15:24:57 2012 -0700
+
+ MonMap: return error on failure in build_initial
+
+ If mon_host fails to parse, return an error instead of success.
+ This avoids failing later on an assert monmap.size() > 0 in the
+ monmap in MonClient.
+
+ Fixes: #2913
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 49b2c7b5a79b8fb4a3941eca2cb0dbaf22f658b7
+Author: Josh Durgin <josh.durgin@inktank.com>
+Date: Wed Aug 8 15:10:27 2012 -0700
+
+ addr_parsing: report correct error message
+
+ getaddrinfo uses its return code to report failures.
+
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 7084f29544f431b7c6a3286356f2448ae0333eda
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Aug 8 14:01:53 2012 -0700
+
+ mkcephfs: use default osd_data, _journal values
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Greg Farnum <greg@inktank.com>
+
+commit 96b1a496cdfda34a5efdb6686becf0d2e7e3a1c0
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Aug 8 14:01:35 2012 -0700
+
+ mkcephfs: use new default keyring locations
+
+ The ceph-conf command only parses the conf; it does not apply default
+ config values. This breaks mkcephfs if values are not specified in the
+ config.
+
+ Let ceph-osd create its own key, fix copying, and fix creation/copying for
+ the mds.
+
+ Fixes: #2845
+ Reported-by: Florian Haas <florian@hastexo.com>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Greg Farnum <greg@inktank.com>
+
+commit 4bd466d6ed49c7192df4a5bf0d63bda5d7d7dd9a
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 31 14:01:57 2012 -0700
+
+ osd: peering: detect when log source osd goes down
+
+ The Peering state has a generic check based on the prior set osds that
+ will restart peering if one of them goes down (or one of the interesting
+ down ones comes up). The GetLog state, however, can pull the log from
+ a peer that is not in the prior set if it got a notify from them (e.g., an
+ osd in an old interval that was down when the prior set was calculated).
+ If that osd goes down, we don't detect it and will block forward.
+
+ Fix by adding a simple check in GetLog for the newest_update_osd going
+ down.
+
+ (BTW GetMissing does not suffer from this problem because
+ peer_missing_requested is a subset of the prior set, so the Peering check
+ is sufficient.)
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Samuel Just <sam.just@inktank.com>
+
+commit 87defa88a0c6d6aafaa65437a6e4ddd92418f834
+Author: Sylvain Munaut <tnt@246tNt.com>
+Date: Tue Jul 31 11:55:56 2012 -0700
+
+ rbd: fix off-by-one error in key name
+
+ Fixes: #2846
+ Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
+
+commit 37d5b46269c8a4227e5df61a88579d94f7b56772
+Author: Sylvain Munaut <tnt@246tNt.com>
+Date: Tue Jul 31 11:54:29 2012 -0700
+
+ secret: return error on empty secret
+
+ Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
+
+commit 7b9d37c662313929b52011ddae47cc8abab99095
+Author: Sage Weil <sage@inktank.com>
+Date: Sat Jul 28 10:05:47 2012 -0700
+
+ osd: set STRAY on pg load when non-primary
+
+ The STRAY bit indicates that we should annouce ourselves to the primary,
+ but it is only set in start_peering_interval(). We also need to set it
+ initially, so that a PG that is loaded but whose role does not change
+ (e.g., the stray replica stays a stray) will notify the primary.
+
+ Observed:
+ - osd starts up
+ - mapping does not change, STRAY not set
+ - does not announce to primary
+ - primary does not re-check must_have_unfound, objects appear unfound
+
+ Fix this by initializing STRAY when pg is loaded or created whenever we
+ are not the primary.
+
+ Fixes: #2866
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 96feca450c5505a06868bc012fe998a03371b77f
+Author: Sage Weil <sage@inktank.com>
+Date: Fri Jul 27 16:03:26 2012 -0700
+
+ osd: peering: make Incomplete a Peering substate
+
+ This allows us to still catch changes in the prior set that would affect
+ our conclusions (that we are incomplete) and, when they happen, restart
+ peering.
+
+ Consider:
+ - calc prior set, osd A is down
+ - query everyone else, no good info
+ - set down, go to Incomplete (previously WaitActingChange) state.
+ - osd A comes back up (we do nothing)
+ - osd A sends notify message with good info (we ignore)
+
+ By making this a Peering substate, we catch the Peering AdvMap reaction,
+ which will notice a prior set down osd is now up and move to Reset.
+
+ Fixes: #2860
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit a71e442fe620fa3a22ad9302413d8344a3a1a969
+Author: Sage Weil <sage@inktank.com>
+Date: Fri Jul 27 15:39:40 2012 -0700
+
+ osd: peering: move to Incomplete when.. incomplete
+
+ PG::choose_acting() may return false and *not* request an acting set change
+ if it can't find any suitable peers with enough info to recover. In that
+ case, we should move to Incomplete, not WaitActingChange, just like we do
+ a bit lower in GetLog() if we have non-contiguous logs. The state name is
+ more accurate, and this is also needed to fix bug #2860.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 623026d9bc8ea4c845eb3b06d79e0ca9bef50deb
+Merge: 87b6e80 9db7809
+Author: Sage Weil <sage@inktank.com>
+Date: Fri Jul 27 14:00:52 2012 -0700
+
+ Merge remote-tracking branch 'gh/stable' into stable-next
+
+commit 9db78090451e609e3520ac3e57a5f53da03f9ee2
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 26 16:35:00 2012 -0700
+
+ osd: fixing sharing of past_intervals on backfill restart
+
+ We need to share past_intervals whenever we instantiate the PG on a peer.
+ In the PG activation case, this is based on whether our peer_info[] value
+ for that peer is dne(). However, the backfill code was updating the
+ peer info (history) in the block preceeding the dne() check, which meant
+ we never shared past_intervals in this case and the peer would have to
+ chew through a potentially large number of maps if the PG has not been
+ clean recently.
+
+ Fix by checking dne() prior to the backfill block. We still need to fill
+ in the message later because it isn't yet instantiated.
+
+ Fixes: #2849
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 87b6e8045a3a1ff6439d2684e960ad0dc8988b33
+Merge: 81d72e5 7dfdf4f
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 26 15:04:12 2012 -0700
+
+ Merge remote-tracking branch 'gh/wip-rbd-bid' into stable-next
+
+commit 81d72e5d7ba4713eb7c290878d901e21c0709028
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Jul 23 10:47:10 2012 -0700
+
+ mon: make 'ceph osd rm ...' wipe out all state bits, not just EXISTS
+
+ This ensures that when a new osd reclaims that id it behaves as if it were
+ really new.
+
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit ad9c37f2c029f6eb372efb711b234014397057e9
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Jul 9 20:54:19 2012 -0700
+
+ test_stress_watch: just one librados instance
+
+ This was creating a new cluster connection/session per iteration, and
+ along with it a few service threads and sockets and so forth.
+
+ Unfortunately, librados leaks like a sieve, starting with CephContext
+ and ceph::crypto::init(). See #845 and #2067.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit c60afe1842a48dd75944822c0872fce6a7229f5a
+Merge: 8833050 35b1326
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 26 15:03:50 2012 -0700
+
+ Merge commit '35b13266923f8095650f45562d66372e618c8824' into stable-next
+
+ First batch of msgr fixes.
+
+commit 88330505cc772a5528e9405d515aa2b945b0819e
+Author: Samuel Just <sam.just@inktank.com>
+Date: Mon Jul 9 15:53:31 2012 -0700
+
+ ReplicatedPG: fix replay op ordering
+
+ After a client reconnect, the client replays outstanding ops. The
+ OSD then immediately responds with success if the op has already
+ committed (version < ReplicatedPG::get_first_in_progress).
+ Otherwise, we stick it in waiting_for_ondisk to be replied to when
+ eval_repop concludes that waitfor_disk is empty.
+
+ Fixes #2508
+
+ Signed-off-by: Samuel Just <sam.just@inktank.com>
+
+ Conflicts:
+
+ src/osd/ReplicatedPG.cc
+
+commit 682609a9343d0488788b1c6b03bc437b7905e4d6
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 18 12:55:35 2012 -0700
+
+ objecter: always resend linger registrations
+
+ If a linger op (watch) is sent to the OSD and updates the object, and then
+ the client loses the reply, it will resend the request. The OSD will see
+ that it is a dup, however, and not set up the in-memory session state for
+ the watch. This in turn will break the watch (i.e., notifies won't
+ get delivered).
+
+ Instead, always resend linger registration ops, so that we always have a
+ unique reqid and do the correct session registeration for each session.
+
+ * track the tid of the registation op for each LingerOp
+ * mark registrations ops as should_resend=false; cancel as needed
+ * when we send a new registration op, cancel the old one to ensure we
+ ignore the reply. This is needed becuase we resend linger ops on any
+ pg change, not just a primary change.
+ * drop the first_send arg to send_linger(), as we can now infer that
+ from register_tid == 0.
+
+ The bug was easily reproduced with ms inject socket failures = 500 and the
+ test_stress_watch utility.
+
+ Fixes: #2796
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 4d7d3e276967d555fed8a689976047f72c96c2db
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Jul 9 13:22:42 2012 -0700
+
+ osd: guard class call decoding
+
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 7fbbe4652ffb2826978aa1f1cacce4456d2ef1fc
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 5 18:08:58 2012 -0700
+
+ librados: take lock when signaling notify cond
+
+ When we are signaling the cond to indicate that a notify is complete,
+ take the appropriate lock. This removes the possibility of a race
+ that loses our signal. (That would be very difficult given that there
+ are network round trips involved, but this makes the lock/cond usage
+ "correct.")
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 6ed01df412b4f4745c8f427a94446987c88b6bef
+Author: Sage Weil <sage@inktank.com>
+Date: Sun Jul 22 07:46:11 2012 -0700
+
+ workqueue: kick -> wake or _wake, depending on locking
+
+ Break kick() into wake() and _wake() methods, depending on whether the
+ lock is already held. (The rename ensures that we audit/fix all
+ callers.)
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+ Conflicts:
+
+ src/common/WorkQueue.h
+ src/osd/OSD.cc
+
+commit d2d40dc3059d91450925534f361f2c03eec9ef88
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 4 15:11:21 2012 -0700
+
+ client: fix locking for SafeCond users
+
+ Need to wait on flock, not client_lock.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit c963a21a8620779d97d6cbb51572551bdbb50d0b
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 26 15:01:05 2012 -0700
+
+ filestore: check for EIO in read path
+
+ Check for EIO in read methods and helpers. Try to do checks in low-level
+ methods (e.g., lfn_*()) to avoid duplication in higher-level methods.
+
+ The transaction apply function already checks for EIO on writes, and will
+ generate a nicer error message, so we can largely ignore the write path,
+ as long as errors get passed up correctly.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 6bd89aeb1bf3b1cbb663107ae6bcda8a84dd8601
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jul 26 09:07:46 2012 -0700
+
+ filestore: add 'filestore fail eio' option, default true
+
+ By default we will assert/fail/crash on EIO from the underlying fs. We
+ already do this in the write path, but not the read path, or in various
+ internal infrastructure.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit e9b5a289838f17f75efbf9d1640b949e7485d530
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 24 13:53:03 2012 -0700
+
+ config: fix 'config set' admin socket command
+
+ Fixes: #2832
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 1a6cd9659abcdad0169fe802ed47967467c448b3
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 25 16:35:09 2012 -0700
+
+ osd: break potentially large transaction into pieces
+
+ We do a similar trick elsewhere. Control this via a tunable. Eventually
+ we'll control the others (in a non-stable branch).
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 15e1622959f5a46f7a98502cdbaebfda2247a35b
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 25 14:53:34 2012 -0700
+
+ osd: only commit past intervals at end of parallel build
+
+ We don't check for gaps in the past intervals, so we should only commit
+ this when we are completely done. Otherwise a partial run and rsetart will
+ leave the gap in place, which may confuse the peering code that relies on
+ this information.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 16302acefd8def98fc4597366d6ba2845e17fcb6
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 25 10:57:35 2012 -0700
+
+ osd: generate past intervals in parallel on boot
+
+ Even though we aggressively share past_intervals with notifies etc, it is
+ still possible for an osd to get buried behind a pile of old maps and need
+ to generate these if it has been out of the cluster for a while. This has
+ happened to us in the past but, sadly, we did not merge the work then.
+ On the bright side, this implementation is much much much cleaner than the
+ old one because of the pg_interval_t helper we've since switched to.
+
+ On bootup, we look at the intervals each pg needs and calclate the union,
+ and then iterate over that map range. The inner bit of the loop is
+ functionally identical to PG::build_past_intervals(), keeping the per-pg
+ state in the pistate struct.
+
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
+ Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit fca65ff52a5f7d49bcac83b3b2232963a879e446
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 25 10:58:07 2012 -0700
+
+ osd: move calculation of past_interval range into helper
+
+ PG::generate_past_intervals() first calculates the range over which it
+ needs to generate past intervals. Do this in a helper function.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
+ Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 5979351ef3d3d03bced9286f79cbc22524c4a8de
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 25 10:58:28 2012 -0700
+
+ osd: fix map epoch boot condition
+
+ We only want to join the cluster if we can catch up to the latest
+ osdmap with a small number of maps, in this case a single map message.
+
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+ Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 8c7186d02627f8255273009269d50955172efb52
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 24 20:18:01 2012 -0700
+
+ mon: ignore pgtemp messages from down osds
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b17f54671f350fd4247f895f7666d46860736728
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 24 20:16:04 2012 -0700
+
+ mon: ignore osd_alive messages from down osds
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 7dfdf4f8de16155edd434534e161e06ba7c79d7d
+Author: Josh Durgin <josh.durgin@inktank.com>
+Date: Mon Jul 23 14:05:53 2012 -0700
+
+ librbd: replace assign_bid with client id and random number
+
+ The assign_bid method has issues with replay because it is a write
+ that also returns data. This means that the replayed operation would
+ return success, but no data, and cause a create to fail. Instead, let
+ the client set the bid based on its global id and a random number.
+
+ This only affects the creation of new images, since the bid is put
+ into an opaque string as part of the object prefix.
+
+ Keep the server side assign_bid around in case there are old clients
+ still using it.
+
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit dc2d67112163bee8b111f75ae3e3ca42884b09b4
+Author: Dan Mick <dan.mick@inktank.com>
+Date: Mon Jul 9 14:11:23 2012 -0700
+
+ librados: add new constructor to form a Rados object from IoCtx
+
+ This creates a separate reference to an existing connection, for
+ use when a client holding IoCtx needs to consult another (say,
+ for rbd cloning)
+
+ Signed-off-by: Dan Mick <dan.mick@inktank.com>
+ Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit c99671201de9d9cdf03bbf0f4e28e8afb70c280c
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 18 19:49:58 2012 -0700
+
+ add CRUSH_TUNABLES feature bit
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 0b579546cfddec35095b2aec753028d8e63f3533
+Author: Josh Durgin <josh.durgin@inktank.com>
+Date: Wed Jul 18 10:24:58 2012 -0700
+
+ ObjectCacher: fix cache_bytes_hit accounting
+
+ Misses are not hits!
+
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 2869039b79027e530c2863ebe990662685e4bbe6
+Author: Pascal de Bruijn | Unilogic Networks B.V <pascal@unilogicnetworks.net>
+Date: Wed Jul 11 15:23:16 2012 +0200
+
+ Robustify ceph-rbdnamer and adapt udev rules
+
+ Below is a patch which makes the ceph-rbdnamer script more robust and
+ fixes a problem with the rbd udev rules.
+
+ On our setup we encountered a symlink which was linked to the wrong rbd:
+
+ /dev/rbd/mypool/myrbd -> /dev/rbd1
+
+ While that link should have gone to /dev/rbd3 (on which a
+ partition /dev/rbd3p1 was present).
+
+ Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
+ with %n is that %n results in a value of 3 (for rbd3), but in a value of
+ 1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
+
+ In the patch below the ceph-rbdnamer script is made more robust and it
+ now it can be called in various ways:
+
+ /usr/bin/ceph-rbdnamer /dev/rbd3
+ /usr/bin/ceph-rbdnamer /dev/rbd3p1
+ /usr/bin/ceph-rbdnamer rbd3
+ /usr/bin/ceph-rbdnamer rbd3p1
+ /usr/bin/ceph-rbdnamer 3
+
+ Even with all these different styles of calling the modified script, it
+ should now return the same rbdname. This change "has" to be combined
+ with calling it from udev with %k though.
+
+ With that fixed, we hit the second problem. We ended up with:
+
+ /dev/rbd/mypool/myrbd -> /dev/rbd3p1
+
+ So the rbdname was symlinked to the partition on the rbd instead of the
+ rbd itself. So what probably went wrong is udev discovering the disk and
+ running ceph-rbdnamer which resolved it to myrbd so the following
+ symlink was created:
+
+ /dev/rbd/mypool/myrbd -> /dev/rbd3
+
+ However partitions would be discovered next and ceph-rbdnamer would be
+ run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
+ the previous correct symlink being overwritten with a faulty one:
+
+ /dev/rbd/mypool/myrbd -> /dev/rbd3p1
+
+ The solution to the problem is in differentiating between disks and
+ partitions in udev and handling them slightly differently. So with the
+ patch below partitions now get their own symlinks in the following style
+ (which is fairly consistent with other udev rules):
+
+ /dev/rbd/mypool/myrbd-part1 -> /dev/rbd3p1
+
+ Please let me know any feedback you have on this patch or the approach
+ used.
+
+ Regards,
+ Pascal de Bruijn
+ Unilogic B.V.
+
+ Signed-off-by: Pascal de Bruijn <pascal@unilogicnetworks.net>
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit 426384f6beccabf9e9b9601efcb8147904ec97c2
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Jul 16 16:02:14 2012 -0700
+
+ log: apply log_level to stderr/syslog logic
+
+ In non-crash situations, we want to make sure the message is both below the
+ syslog/stderr threshold and also below the normal log threshold. Otherwise
+ we get anything we gather on those channels, even when the log level is
+ low.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 8dafcc5c1906095cb7d15d648a7c1d7524df3768
+Author: Sage Weil <sage@inktank.com>
+Date: Mon Jul 16 15:40:53 2012 -0700
+
+ log: fix event gather condition
+
+ We should gather an event if it is below the log or gather threshold.
+
+ Previously we were only gathering if we were going to print it, which makes
+ the dump no more useful than what was already logged.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit ec5cd6def9817039704b6cc010f2797a700d8500
+Author: Samuel Just <sam.just@inktank.com>
+Date: Mon Jul 16 13:11:24 2012 -0700
+
+ PG::RecoveryState::Stray::react(LogEvt&): reset last_pg_scrub
+
+ We need to reset the last_pg_scrub data in the osd since we
+ are replacing the info.
+
+ Probably fixes #2453
+
+ In cases like 2453, we hit the following backtrace:
+
+ 0> 2012-05-19 17:24:09.113684 7fe66be3d700 -1 osd/OSD.h: In function 'void OSD::unreg_last_pg_scrub(pg_t, utime_t)' thread 7fe66be3d700 time 2012-05-19 17:24:09.095719
+ osd/OSD.h: 840: FAILED assert(last_scrub_pg.count(p))
+
+ ceph version 0.46-313-g4277d4d (commit:4277d4d3378dde4264e2b8d211371569219c6e4b)
+ 1: (OSD::unreg_last_pg_scrub(pg_t, utime_t)+0x149) [0x641f49]
+ 2: (PG::proc_primary_info(ObjectStore::Transaction&, pg_info_t const&)+0x5e) [0x63383e]
+ 3: (PG::RecoveryState::ReplicaActive::react(PG::RecoveryState::MInfoRec const&)+0x4a) [0x633eda]
+ 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::MQuery>, boost::statechart::custom_reaction<PG::RecoveryState::MInfoRec>, boost::statechart::custom_reaction<PG::RecoveryState::MLogRec> >, boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x130) [0x6466a0]
+ 5: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x81) [0x646791]
+ 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x63dfcb]
+ 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x63e0f1]
+ 8: (PG::RecoveryState::handle_info(int, pg_info_t&, PG::RecoveryCtx*)+0x177) [0x616987]
+ 9: (OSD::handle_pg_info(std::tr1::shared_ptr<OpRequest>)+0x665) [0x5d3d15]
+ 10: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x2a0) [0x5d7370]
+ 11: (OSD::_dispatch(Message*)+0x191) [0x5dd4a1]
+ 12: (OSD::ms_dispatch(Message*)+0x153) [0x5ddda3]
+ 13: (SimpleMessenger::dispatch_entry()+0x863) [0x77fbc3]
+ 14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x746c5d]
+ 15: (()+0x7efc) [0x7fe679b1fefc]
+ 16: (clone()+0x6d) [0x7fe67815089d]
+ NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
+
+ Because we don't clear the scrub state before reseting info,
+ the last_scrub_stamp state in the info.history structure
+ changes without updating the osd state resulting in the
+ above assert failure.
+
+ Backport: stable
+
+ Signed-off-by: Samuel Just <sam.just@inktank.com>
+
+commit 248cfaddd0403c7bae8e1533a3d2e27d1a335b9b
+Author: Samuel Just <sam.just@inktank.com>
+Date: Mon Jul 9 17:57:03 2012 -0700
+
+ ReplicatedPG: don't warn if backfill peer stats don't match
+
+ pinfo.stats might be wrong if we did log-based recovery on the
+ backfilled portion in addition to continuing backfill.
+
+ bug #2750
+
+ Signed-off-by: Samuel Just <sam.just@inktank.com>
+
+commit bcb1073f9171253adc37b67ee8d302932ba1667b
+Author: Sage Weil <sage@inktank.com>
+Date: Sun Jul 15 20:30:34 2012 -0700
+
+ mon/MonitorStore: always O_TRUNC when writing states
+
+ It is possible for a .new file to already exist, potentially with a
+ larger size. This would happen if:
+
+ - we were proposing a different value
+ - we crashed (or were stopped) before it got renamed into place
+ - after restarting, a different value was proposed and accepted.
+
+ This isn't so unlikely for the log state machine, where we're
+ aggregating random messages. O_TRUNC ensure we avoid getting the tail
+ end of some previous junk.
+
+ I observed #2593 and found that a logm state value had a larger size on
+ one mon (after slurping) than the others, pointing to put_bl_sn_map().
+
+ While we are at it, O_TRUNC put_int() too; the same type of bug is
+ possible there, too.
+
+ Fixes: #2593
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 41a570778a51fe9a36a5b67a177d173889e58363
+Author: Sage Weil <sage@inktank.com>
+Date: Sat Jul 14 14:31:34 2012 -0700
+
+ osd: based misdirected op role calc on acting set
+
+ We want to look at the acting set here, nothing else. This was causing us
+ to erroneously queue ops for later (wasting memory) and to erroneously
+ print out a 'misdrected op' message in the cluster log (confusion and
+ incorrect [but ignored] -ENXIO reply).
+
+ Fixes: #2022
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b3d077c61e977e8ebb91288aa2294fb21c197fe7
+Author: Josh Durgin <josh.durgin@inktank.com>
+Date: Fri Jul 13 09:42:20 2012 -0700
+
+ qa: download tests from specified branch
+
+ These python tests aren't installed, so they need to be downloaded
+
+ Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
+
+commit e855cb247b5a9eda6845637e2da5b6358f69c2ed
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Mon Jun 25 09:47:37 2012 -0700
+
+ rgw: don't override subuser perm mask if perm not specified
+
+ Bug #2650. We were overriding subuser perm mask whenever subuser
+ was modified, even if perm mask was not passed.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit d6c766ea425d87a2f2405c08dcec66f000a4e1a0
+Author: James Page <james.page@ubuntu.com>
+Date: Wed Jul 11 11:34:21 2012 -0700
+
+ debian: fix ceph-fs-common-dbg depends
+
+ Signed-off-by: James Page <james.page@ubuntu.com>
+
+commit 95e8d87bc3fb12580e4058401674b93e19df6e02
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Wed Jul 11 11:52:24 2012 -0700
+
+ rados tool: remove -t param option for target pool
+
+ Bug #2772. This fixes an issue that was introduced when we
+ added the 'rados cp' command. The -t param was already used
+ for rados bench. With this change the only way to specify
+ a target pool is using --target-pool.
+ Though this problem is post argonaut, the 'rados cp' command
+ has been backported, so we need this fix there too.
+
+ Backport: argonaut
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 5b10778399d5bee602e57035df7d40092a649c06
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 11 09:19:00 2012 -0700
+
+ Makefile: don't install crush headers
+
+ This is leftover from when we built a libcrush.so. We can re-add when we
+ start doing that again.
+
+ Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 35b13266923f8095650f45562d66372e618c8824
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 10 13:18:27 2012 -0700
+
+ msgr: take over existing Connection on Pipe replacement
+
+ If a new pipe/socket is taking over an existing session, it should also
+ take over the Connection* associated with the existing session. Because
+ we cannot clear existing->connection_state, we just take another reference.
+
+ Clean up the comments a bit while we're here.
+
+ This affects MDS<->client sessions when reconnecting after a socket fault.
+ It probably also affects intra-cluster (osd/osd, mds/mds, mon/mon)
+ sessions as well, but I did not confirm that.
+
+ Backport: argonaut
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b387077b1d019ee52b28bc3bc5305bfb53dfd892
+Author: Sage Weil <sage@inktank.com>
+Date: Sun Jul 8 20:33:12 2012 -0700
+
+ debian: include librados-config in librados-dev
+
+ Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 03c2dc244af11b711e2514fd5f32b9bfa34183f6
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 13:04:28 2012 -0700
+
+ lockdep: increase max locks
+
+ Hit this limit with the rados api tests.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b554d112c107efe78ec64f85b5fe588f1e7137ce
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 12:07:28 2012 -0700
+
+ config: add unlocked version of get_my_sections; use it internally
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 01da287b8fdc07262be252f1a7c115734d3cc328
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 08:20:06 2012 -0700
+
+ config: fix lock recursion in get_val_from_conf_file()
+
+ Introduce a private, already-locked version.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit c73c64a0f722477a5b0db93da2e26e313a5f52ba
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 08:15:08 2012 -0700
+
+ config: fix recursive lock in parse_config_files()
+
+ The _impl() helper is only called from parse_config_files(); don't retake
+ the lock.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 6646e891ff0bd31c935d1ce0870367b1e086ddfd
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 18:51:02 2012 -0700
+
+ rgw: initialize fields of RGWObjEnt
+
+ This fixes various valgrind warnings triggered by the s3test
+ test_object_create_unreadable.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b33553aae63f70ccba8e3d377ad3068c6144c99a
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Fri Jul 6 13:14:53 2012 -0700
+
+ rgw: handle response-* params
+
+ Handle response-* params that set response header field values.
+ Fixes #2734, #2735.
+ Backport: argonaut
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 74f687501a8a02ef248a76f061fbc4d862a9abc4
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jul 4 13:59:04 2012 -0700
+
+ osd: add missing formatter close_section() to scrub status
+
+ Also add braces to make the open/close matchups easier to see. Broken
+ by f36617392710f9b3538bfd59d45fd72265993d57.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 020b29961303b12224524ddf78c0c6763a61242e
+Author: Mike Ryan <mike.ryan@inktank.com>
+Date: Wed Jun 27 14:14:30 2012 -0700
+
+ pg: report scrub status
+
+ Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
+
+commit db6d83b3ed51c07b361b27d2e5ce3227a51e2c60
+Author: Mike Ryan <mike.ryan@inktank.com>
+Date: Wed Jun 27 13:30:45 2012 -0700
+
+ pg: track who we are waiting for maps from
+
+ Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
+
+commit e1d4855fa18b1cda85923ad9debd95768260d4eb
+Author: Mike Ryan <mike.ryan@inktank.com>
+Date: Tue Jun 26 16:25:27 2012 -0700
+
+ pg: reduce scrub write lock window
+
+ Wait for all replicas to construct the base scrub map before finalizing
+ the scrub and locking out writes.
+
+ Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
+
+commit 27409aa1612c1512bf393de22b62bbfe79b104c1
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jul 5 15:52:51 2012 -0700
+
+ rgw: don't store bucket info indexed by bucket_id
+
+ Issue #2701. This info wasn't really used anywhere and we weren't
+ removing it. It was also sharing the same pool namespace as the
+ info indexed by bucket name, which is bad.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 9814374a2b40e15c13eb03ce6b8e642b0f7f93e4
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jul 5 14:59:22 2012 -0700
+
+ test_rados_tool.sh: test copy pool
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit d75100667a539baf47c79d752b787ed5dcb51d7a
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jul 5 13:42:23 2012 -0700
+
+ rados tool: copy object in chunks
+
+ Instead of reading the entire object and then writing it,
+ we read it in chunks.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 16ea64fbdebb7a74e69e80a18d98f35d68b8d9a1
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Fri Jun 29 14:43:00 2012 -0700
+
+ rados tool: copy entire pool
+
+ A new rados tool command that copies an entire pool
+ into another existing pool.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 960c2124804520e81086df97905a299c8dd4e08c
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Fri Jun 29 14:09:08 2012 -0700
+
+ rados tool: copy object
+
+ New rados command: rados cp <src-obj> [dest-obj]
+
+ Requires specifying source pool. Target pool and locator can be specified.
+ The new command preserves object xattrs and omap data.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 23d31d3e2aa7f2b474a7b8e9d40deb245d8be9de
+Author: Sage Weil <sage@inktank.com>
+Date: Fri Jul 6 08:47:44 2012 -0700
+
+ ceph.spec.in: add ceph-disk-{activate,prepare}
+
+ Reported-by: Jimmy Tang <jtang@tchpc.tcd.ie>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit ea11c7f9d8fd9795e127cfd7e8a1f28d4f5472e9
+Author: Wido den Hollander <wido@widodh.nl>
+Date: Thu Jul 5 15:29:54 2012 +0200
+
+ Allow URL-safe base64 cephx keys to be decoded.
+
+ In these cases + and / are replaced by - and _ to prevent problems when using
+ the base64 strings in URLs.
+
+ Signed-off-by: Wido den Hollander <wido@widodh.nl>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit f67fe4e368b5f250f0adfb183476f5f294e8a529
+Author: Wido den Hollander <wido@widodh.nl>
+Date: Wed Jul 4 15:46:04 2012 +0200
+
+ librados: Bump the version to 0.48
+
+ Signed-off-by: Wido den Hollander <wido@widodh.nl>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 35b9ec881aecf84b3a49ec0395d7208de36dc67d
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Tue Jun 26 17:28:51 2012 -0700
+
+ rgw-admin: use correct modifier with strptime
+
+ Bug #2658: used %I (12h) instead of %H (24h)
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit da251fe88503d32b86113ee0618db7c446d34853
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jun 21 15:40:27 2012 -0700
+
+ rgw: send both swift x-storage-token and x-auth-token
+
+ older clients need x-storage-token, newer x-auth-token
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 4c19ecb9a34e77e71d523a0a97e17f747bd5767d
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jun 21 15:17:19 2012 -0700
+
+ rgw: radosgw-admin date params now also accept time
+
+ The date format now is "YYYY-MM-DD[ hh:mm:ss]". Got rid of
+ the --time param for the old ops log stuff.
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+ Conflicts:
+
+ src/test/cli/radosgw-admin/help.t
+
+commit 6958aeb898fc683159483bfbb798f069a9b5330a
+Author: Yehuda Sadeh <yehuda@inktank.com>
+Date: Thu Jun 21 13:14:47 2012 -0700
+
+ rgw-admin: fix usage help
+
+ s/show/trim
+
+ Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
+
+commit 83c043f803ab2ed74fa9a84ae9237dd7df2a0c57
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 14:07:16 2012 -0700
+
+ radosgw-admin: fix clit test
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 5674158163e9c1d50985796931240b237676b74d
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 11:32:57 2012 -0700
+
+ ceph: fix cli help test
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 151bf0eef59acae2d1fcf3f0feb8b6aa963dc2f6
+Author: Samuel Just <sam.just@inktank.com>
+Date: Tue Jul 3 11:23:16 2012 -0700
+
+ ReplicatedPG: remove faulty scrub assert in sub_op_modify_applied
+
+ This assert assumed that all ops submitted before MOSDRepScrub was
+ submitted were processed by the time that MOSDRepScrub was
+ processed. In fact, MOSDRepScrub's scrub_to may refer to a
+ last_update yet to be seen by the replica.
+
+ Bug #2693
+
+ Signed-off-by: Samuel Just <sam.just@inktank.com>
+
+commit 32833e88a1ad793fa4be86101ce9c22b6f677c06
+Author: Kyle Bader <kyle.bader@dreamhost.com>
+Date: Tue Jul 3 11:20:38 2012 -0700
+
+ ceph: better usage
+
+ Signed-off-by: Kyle Bader <kyle.bader@dreamhost.com>
+
+commit 67455c21879c9c117f6402259b5e2da84524e169
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 09:20:35 2012 -0700
+
+ debian: strip new ceph-mds package
+
+ Reported-by: Amon Ott <a.ott@m-privacy.de>
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b53cdb97d15f9276a9b26bec9f29034149f93358
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jul 3 06:46:10 2012 -0700
+
+ config: remove bad argparse_flag argument in parse_option()
+
+ This is wrong, and thankfully valgrind picks it up.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit f7d4e39740fd2afe82ac40c711bd3fe7a282e816
+Author: Sage Weil <sage@inktank.com>
+Date: Sun Jul 1 17:23:28 2012 -0700
+
+ msgr: restart_queue when replacing existing pipe and taking over the queue
+
+ The queue may have been previously stopped (by discard_queue()), and needs
+ to be restarted.
+
+ Fixes consistent failures from the mon_recovery.py integration tests.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 5dfd2a512d309f7f641bcf7c43277f08cf650b01
+Author: Sage Weil <sage@inktank.com>
+Date: Sun Jul 1 15:37:31 2012 -0700
+
+ msgr: choose incoming connection if ours is STANDBY
+
+ If the connect_seq matches, but our existing connection is in STANDBY, take
+ the incoming one. Otherwise, the other end will wait indefinitely for us
+ to connect but we won't.
+
+ Alternatively, we could "win" the race and trigger a connection by sending
+ a keepalive (or similar), but that is more work; we may as well accept the
+ incoming connection we have now.
+
+ This removes STANDBY from the acceptable WAIT case states. It also keeps
+ responsibility squarely on the shoulders of the peer with something to
+ deliver.
+
+ Without this patch, a 3-osd vstart cluster with
+ 'ms inject socket failures = 100' and rados bench write -b 4096 would start
+ generating slow request warnings after a few minutes due to the osds
+ failing to connect to each other. With the patch, I complete a 10 minute
+ run without problems.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit b7007a159f6d941fa8313a24af5810ce295b36ca
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jun 28 17:50:47 2012 -0700
+
+ msgr: preserve incoming message queue when replacing pipes
+
+ If we replace an existing pipe with a new one, move the incoming queue
+ of messages that have not yet been dispatched over to the new Pipe so that
+ they are not lost. This prevents messages from being lost.
+
+ Alternatively, we could set in_seq = existing->in_seq - existing->in_qlen,
+ but that would make the other end resend those messages, which is a waste
+ of bandwidth.
+
+ Very easy to reproduce the original bug with 'ms inject socket failures'.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 1f3a722e150f9f27fe7919e9579b5a88dcd15639
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jun 28 17:45:24 2012 -0700
+
+ msgr: move dispatch_entry into DispatchQueue class
+
+ A bit cleaner.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 03445290dad5b1213dd138cacf46e379400201c9
+Author: Sage Weil <sage@inktank.com>
+Date: Thu Jun 28 17:38:34 2012 -0700
+
+ msgr: move incoming queue to separate class
+
+ This extricates the incoming queue and its funky relationship with
+ DispatchQueue from Pipe and moves it into IncomingQueue. There is now a
+ single IncomingQueue attached to each Pipe. DispatchQueue is now no
+ longer tied to Pipe.
+
+ This modularizes the code a bit better (tho that is still a work in
+ progress) and (more importantly) will make it possible to move the
+ incoming messages from one pipe to another in accept().
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 0dbc54169512da776c16161ec3b8fa0b3f08e248
+Author: Sage Weil <sage@inktank.com>
+Date: Wed Jun 27 17:06:40 2012 -0700
+
+ msgr: make D_CONNECT constant non-zero, fix ms_handle_connect() callback
+
+ A while ago we inadvertantly broke ms_handle_connect() callbacks because
+ of a check for m being non-zero in the dispatch_entry() thread. Adjust the
+ enums so that they get delivered again.
+
+ This fixes hangs when, for example, the ceph tool sends a command, gets a
+ connection reset, and doesn't get the connect callback to resend after
+ reconnecting to a new monitor.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 2429556a51e8f60b0d9bdee71ef7b34b367f2f38
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jun 26 17:10:40 2012 -0700
+
+ msgr: fix pipe replacement assert
+
+ We may replace an existing pipe in the STANDBY state if the previous
+ attempt failed during accept() (see previous patches).
+
+ This might fix #1378.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit 204bc594be1a6046d1b362693d086b49294c2a27
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jun 26 17:07:31 2012 -0700
+
+ msgr: do not try to reconnect con with CLOSED pipe
+
+ If we have a con with a closed pipe, drop the message. For lossless
+ sessions, the state will be STANDBY if we should reconnect. For lossy
+ sessions, we will end up with CLOSED and we *should* drop the message.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>
+
+commit e6ad6d25a58b8e34a220d090d01e26293c2437b4
+Author: Sage Weil <sage@inktank.com>
+Date: Tue Jun 26 17:06:41 2012 -0700
+
+ msgr: move to STANDBY if we replace during accept and then fail
+
+ If we replace an existing pipe during accept() and then fail, move to
+ STANDBY so that our connection state (connect_seq, etc.) is preserved.
+ Otherwise, we will throw out that information and falsely trigger a
+ RESETSESSION on the next connection attempt.
+
+ Signed-off-by: Sage Weil <sage@inktank.com>