diff options
Diffstat (limited to 'networking-odl/doc/source/specs/journal-recovery.rst')
-rw-r--r-- | networking-odl/doc/source/specs/journal-recovery.rst | 152 |
1 files changed, 0 insertions, 152 deletions
diff --git a/networking-odl/doc/source/specs/journal-recovery.rst b/networking-odl/doc/source/specs/journal-recovery.rst deleted file mode 100644 index 1132485..0000000 --- a/networking-odl/doc/source/specs/journal-recovery.rst +++ /dev/null @@ -1,152 +0,0 @@ -.. - This work is licensed under a Creative Commons Attribution 3.0 Unported - License. - - http://creativecommons.org/licenses/by/3.0/legalcode - -================ -Journal Recovery -================ - -https://blueprints.launchpad.net/networking-odl/+spec/journal-recovery - -Journal entries in the failed state need to be handled somehow. This spec will -try to address the issue and propose a solution. - -Problem Description -=================== - -Currently there is no handling for Journal entries that reach the failed state. -A journal entry can reach the failed state for several reasons, some of which -are: - -* Reached maximum failed attempts for retrying the operation. - -* Inconsistency between ODL and the Neutron DB. - - * For example: An update fails because the resource doesn't exist in ODL. - -* Bugs that can lead to failure to sync up. - -These entries will be left in the journal table forever which is a bit wasteful -since they take up some space on the DB storage and also affect the performance -of the journal table. -Albeit each entry has a negligble effect on it's own, the impact of a large -number of such entries can become quite significant. - -Proposed Change -=============== - -A "journal recovery" routine will run as part of the current journal -maintenance process. -This routine will scan the journal table for rows in the "failed" state and -will try to sync the resource for that entry. - -The procedure can be best described by the following flow chart: - -asciiflow:: - - +-----------------+ - | For each entry | - | in failed state | - +-------+---------+ - | - +-------v--------+ - | Query resource | - | on ODL (REST) | - +-----+-----+----+ - | | +-----------+ - Resource | | Determine | - exists +--Resource doesn't exist--> operation | - | | type | - +-----v-----+ +-----+-----+ - | Determine | | - | operation | | - | type | | - +-----+-----+ | - | +------------+ | - +--Create------> Mark entry <--Delete--+ - | | completed | | - | +----------^-+ Create/ - | | Update - | | | - | +------------+ | +-----v-----+ - +--Delete--> Mark entry | | | Determine | - | | pending | | | parent | - | +---------^--+ | | relation | - | | | +-----+-----+ - +-----v------+ | | | - | Compare to +--Different--+ | | - | resource | | | - | in DB +--Same------------+ | - +------------+ | - | - +-------------------+ | - | Create entry for <-----Has no parent------+ - | resource creation | | - +--------^----------+ Has a parent - | | - | +---------v-----+ - +------Parent exists------+ Query parent | - | on ODL (REST) | - +---------+-----+ - +------------------+ | - | Create entry for <---Parent doesn't exist--+ - | parent creation | - +------------------+ - -For every error during the process the entry will remain in failed state but -the error shouldn't stop processing of further entries. - - -The implementation could be done in two phases where the parent handling is -done in a a second pahse. -For the first phase if we detect an entry that is in failed for a create/update -operation and the resource doesn't exist on ODL we create a new "create -resource" journal entry for the resource. - -This propsal utilises the journal mechanism for it's operation while the only -part that deviates from the standard mode of operation is when it queries ODL -directly. This direct query has to be done to get ODL's representation of the -resource. - -Performance Impact ------------------- - -The maintenance thread will have another task to handle. This can lead to -longer processing time and even cause the thread to skip an iteration. -This is not an issue since the maintenance thread runs in parallel and doesn't -directly impact the responsiveness of the system. - -Since most operations here involve I/O then CPU probably won't be impacted. - -Network traffic would be impacted slightly since we will attempt to fetch the -resource each time from ODL and we might attempt to fetch it's parent. -This is however negligble as we do this only for failed entries, which are -expected to appear rarely. - - -Alternatives ------------- - -The partial sync process could make this process obsolete (along with full -sync), but it's a far more complicated and problematic process. -It's better to start with this process which is more lightweight and doable -and consider partial sync in the future. - - -Assignee(s) -=========== - -Primary assignee: - mkolesni <mkolesni@redhat.com> - -Other contributors: - None - - -References -========== - -https://goo.gl/IOMpzJ - |