diff options
Diffstat (limited to 'qemu/docs/specs/ppc-spapr-hotplug.txt')
-rw-r--r-- | qemu/docs/specs/ppc-spapr-hotplug.txt | 353 |
1 files changed, 0 insertions, 353 deletions
diff --git a/qemu/docs/specs/ppc-spapr-hotplug.txt b/qemu/docs/specs/ppc-spapr-hotplug.txt deleted file mode 100644 index 631b0cada..000000000 --- a/qemu/docs/specs/ppc-spapr-hotplug.txt +++ /dev/null @@ -1,353 +0,0 @@ -= sPAPR Dynamic Reconfiguration = - -sPAPR/"pseries" guests make use of a facility called dynamic-reconfiguration -to handle hotplugging of dynamic "physical" resources like PCI cards, or -"logical"/paravirtual resources like memory, CPUs, and "physical" -host-bridges, which are generally managed by the host/hypervisor and provided -to guests as virtualized resources. The specifics of dynamic-reconfiguration -are documented extensively in PAPR+ v2.7, Section 13.1. This document -provides a summary of that information as it applies to the implementation -within QEMU. - -== Dynamic-reconfiguration Connectors == - -To manage hotplug/unplug of these resources, a firmware abstraction known as -a Dynamic Resource Connector (DRC) is used to assign a particular dynamic -resource to the guest, and provide an interface for the guest to manage -configuration/removal of the resource associated with it. - -== Device-tree description of DRCs == - -A set of 4 Open Firmware device tree array properties are used to describe -the name/index/power-domain/type of each DRC allocated to a guest at -boot-time. There may be multiple sets of these arrays, rooted at different -paths in the device tree depending on the type of resource the DRCs manage. - -In some cases, the DRCs themselves may be provided by a dynamic resource, -such as the DRCs managing PCI slots on a hotplugged PHB. In this case the -arrays would be fetched as part of the device tree retrieval interfaces -for hotplugged resources described under "Guest->Host interface". - -The array properties are described below. Each entry/element in an array -describes the DRC identified by the element in the corresponding position -of ibm,drc-indexes: - -ibm,drc-names: - first 4-bytes: BE-encoded integer denoting the number of entries - each entry: a NULL-terminated <name> string encoded as a byte array - - <name> values for logical/virtual resources are defined in PAPR+ v2.7, - Section 13.5.2.4, and basically consist of the type of the resource - followed by a space and a numerical value that's unique across resources - of that type. - - <name> values for "physical" resources such as PCI or VIO devices are - defined as being "location codes", which are the "location labels" of - each encapsulating device, starting from the chassis down to the - individual slot for the device, concatenated by a hyphen. This provides - a mapping of resources to a physical location in a chassis for debugging - purposes. For QEMU, this mapping is less important, so we assign a - location code that conforms to naming specifications, but is simply a - location label for the slot by itself to simplify the implementation. - The naming convention for location labels is documented in detail in - PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>" - for PCI/VIO device slots, where <n> is unique across all PCI/VIO - device slots. - -ibm,drc-indexes: - first 4-bytes: BE-encoded integer denoting the number of entries - each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs - in the machine - - <index> is arbitrary, but in the case of QEMU we try to maintain the - convention used to assign them to pSeries guests on pHyp: - - bit[31:28]: integer encoding of <type>, where <type> is: - 1 for CPU resource - 2 for PHB resource - 3 for VIO resource - 4 for PCI resource - 8 for Memory resource - bit[27:0]: integer encoding of <id>, where <id> is unique across - all resources of specified type - -ibm,drc-power-domains: - first 4-bytes: BE-encoded integer denoting the number of entries - each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the - power domain the resource will be assigned to. In the case of QEMU - we associated all resources with a "live insertion" domain, where the - power is assumed to be managed automatically. The integer value for - this domain is a special value of -1. - - -ibm,drc-types: - first 4-bytes: BE-encoded integer denoting the number of entries - each entry: a NULL-terminated <type> string encoded as a byte array - - <type> is assigned as follows: - "CPU" for a CPU - "PHB" for a physical host-bridge - "SLOT" for a VIO slot - "28" for a PCI slot - "MEM" for memory resource - -== Guest->Host interface to manage dynamic resources == - -Each DRC is given a globally unique DRC Index, and resources associated with -a particular DRC are configured/managed by the guest via a number of RTAS -calls which reference individual DRCs based on the DRC index. This can be -considered the guest->host interface. - -rtas-set-power-level: - arg[0]: integer identifying power domain - arg[1]: new power level for the domain, 0-100 - output[0]: status, 0 on success - output[1]: power level after command - - Set the power level for a specified power domain - -rtas-get-power-level: - arg[0]: integer identifying power domain - output[0]: status, 0 on success - output[1]: current power level - - Get the power level for a specified power domain - -rtas-set-indicator: - arg[0]: integer identifying sensor/indicator type - arg[1]: index of sensor, for DR-related sensors this is generally the - DRC index - arg[2]: desired sensor value - output[0]: status, 0 on success - - Set the state of an indicator or sensor. For the purpose of this document we - focus on the indicator/sensor types associated with a DRC. The types are: - - 9001: isolation-state, controls/indicates whether a device has been made - accessible to a guest - - supported sensor values: - 0: isolate, device is made unaccessible by guest OS - 1: unisolate, device is made available to guest OS - - 9002: dr-indicator, controls "visual" indicator associated with device - - supported sensor values: - 0: inactive, resource may be safely removed - 1: active, resource is in use and cannot be safely removed - 2: identify, used to visually identify slot for interactive hotplug - 3: action, in most cases, used in the same manner as identify - - 9003: allocation-state, generally only used for "logical" DR resources to - request the allocation/deallocation of a resource prior to acquiring - it via isolation-state->unisolate, or after releasing it via - isolation-state->isolate, respectively. for "physical" DR (like PCI - hotplug/unplug) the pre-allocation of the resource is implied and - this sensor is unused. - - supported sensor values: - 0: unusable, tell firmware/system the resource can be - unallocated/reclaimed and added back to the system resource pool - 1: usable, request the resource be allocated/reserved for use by - guest OS - 2: exchange, used to allocate a spare resource to use for fail-over - in certain situations. unused in QEMU - 3: recover, used to reclaim a previously allocated resource that's - not currently allocated to the guest OS. unused in QEMU - -rtas-get-sensor-state: - arg[0]: integer identifying sensor/indicator type - arg[1]: index of sensor, for DR-related sensors this is generally the - DRC index - output[0]: status, 0 on success - - Used to read an indicator or sensor value. - - For DR-related operations, the only noteworthy sensor is dr-entity-sense, - which has a type value of 9003, as allocation-state does in the case of - rtas-set-indicator. The semantics/encodings of the sensor values are distinct - however: - - supported sensor values for dr-entity-sense (9003) sensor: - 0: empty, - for physical resources: DRC/slot is empty - for logical resources: unused - 1: present, - for physical resources: DRC/slot is populated with a device/resource - for logical resources: resource has been allocated to the DRC - 2: unusable, - for physical resources: unused - for logical resources: DRC has no resource allocated to it - 3: exchange, - for physical resources: unused - for logical resources: resource available for exchange (see - allocation-state sensor semantics above) - 4: recovery, - for physical resources: unused - for logical resources: resource available for recovery (see - allocation-state sensor semantics above) - -rtas-ibm-configure-connector: - arg[0]: guest physical address of 4096-byte work area buffer - arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero - if a prior RTAS response indicated a need for additional memory - output[0]: status: - 0: completed transmittal of device-tree node - 1: instruct guest to prepare for next DT sibling node - 2: instruct guest to prepare for next DT child node - 3: instruct guest to prepare for next DT property - 4: instruct guest to ascend to parent DT node - 5: instruct guest to provide additional work-area buffer - via arg[1] - 990x: instruct guest that operation took too long and to try - again later - - Used to fetch an OF device-tree description of the resource associated with - a particular DRC. The DRC index is encoded in the first 4-bytes of the first - work area buffer. - - Work area layout, using 4-byte offsets: - wa[0]: DRC index of the DRC to fetch device-tree nodes from - wa[1]: 0 (hard-coded) - wa[2]: for next-sibling/next-child response: - wa offset of null-terminated string denoting the new node's name - for next-property response: - wa offset of null-terminated string denoting new property's name - wa[3]: for next-property response (unused otherwise): - byte-length of new property's value - wa[4]: for next-property response (unused otherwise): - new property's value, encoded as an OFDT-compatible byte array - -== hotplug/unplug events == - -For most DR operations, the hypervisor will issue host->guest add/remove events -using the EPOW/check-exception notification framework, where the host issues a -check-exception interrupt, then provides an RTAS event log via an -rtas-check-exception call issued by the guest in response. This framework is -documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown -requests via EPOW events. - -For DR, this framework has been extended to include hotplug events, which were -previously unneeded due to direct manipulation of DR-related guest userspace -tools by host-level management such as an HMC. This level of management is not -applicable to PowerKVM, hence the reason for extending the notification -framework to support hotplug events. - -Note that these events are not yet formally part of the PAPR+ specification, -but support for this format has already been implemented in DR-related -guest tools such as powerpc-utils/librtas, as well as kernel patches that have -been submitted to handle in-kernel processing of memory/cpu-related hotplug -events[1], and is planned for formal inclusion is PAPR+ specification. The -hotplug-specific payload is QEMU implemented as follows (with all values -encoded in big-endian format): - -struct rtas_event_log_v6_hp { -#define SECTION_ID_HOTPLUG 0x4850 /* HP */ - struct section_header { - uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ - uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), - * plus the length of the DRC name - * if a DRC name identifier is - * specified for hotplug_identifier - */ - uint8_t section_version; /* version 1 */ - uint8_t section_subtype; /* unused */ - uint16_t creator_component_id; /* unused */ - } hdr; -#define RTAS_LOG_V6_HP_TYPE_CPU 1 -#define RTAS_LOG_V6_HP_TYPE_MEMORY 2 -#define RTAS_LOG_V6_HP_TYPE_SLOT 3 -#define RTAS_LOG_V6_HP_TYPE_PHB 4 -#define RTAS_LOG_V6_HP_TYPE_PCI 5 - uint8_t hotplug_type; /* type of resource/device */ -#define RTAS_LOG_V6_HP_ACTION_ADD 1 -#define RTAS_LOG_V6_HP_ACTION_REMOVE 2 - uint8_t hotplug_action; /* action (add/remove) */ -#define RTAS_LOG_V6_HP_ID_DRC_NAME 1 -#define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 -#define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 - uint8_t hotplug_identifier; /* type of the resource identifier, - * which serves as the discriminator - * for the 'drc' union field below - */ - uint8_t reserved; - union { - uint32_t index; /* DRC index of resource to take action - * on - */ - uint32_t count; /* number of DR resources to take - * action on (guest chooses which) - */ - char name[1]; /* string representing the name of the - * DRC to take action on - */ - } drc; -} QEMU_PACKED; - -== ibm,lrdr-capacity == - -ibm,lrdr-capacity is a property in the /rtas device tree node that identifies -the dynamic reconfiguration capabilities of the guest. It consists of a triple -consisting of <phys>, <size> and <maxcpus>. - - <phys>, encoded in BE format represents the maximum address in bytes and - hence the maximum memory that can be allocated to the guest. - - <size>, encoded in BE format represents the size increments in which - memory can be hot-plugged to the guest. - - <maxcpus>, a BE-encoded integer, represents the maximum number of - processors that the guest can have. - -pseries guests use this property to note the maximum allowed CPUs for the -guest. - -== ibm,dynamic-reconfiguration-memory == - -ibm,dynamic-reconfiguration-memory is a device tree node that represents -dynamically reconfigurable logical memory blocks (LMB). This node -is generated only when the guest advertises the support for it via -ibm,client-architecture-support call. Memory that is not dynamically -reconfigurable is represented by /memory nodes. The properties of this -node that are of interest to the sPAPR memory hotplug implementation -in QEMU are described here. - -ibm,lmb-size - -This 64bit integer defines the size of each dynamically reconfigurable LMB. - -ibm,associativity-lookup-arrays - -This property defines a lookup array in which the NUMA associativity -information for each LMB can be found. It is a property encoded array -that begins with an integer M, the number of associativity lists followed -by an integer N, the number of entries per associativity list and terminated -by M associativity lists each of length N integers. - -This property provides the same information as given by ibm,associativity -property in a /memory node. Each assigned LMB has an index value between -0 and M-1 which is used as an index into this table to select which -associativity list to use for the LMB. This index value for each LMB -is defined in ibm,dynamic-memory property. - -ibm,dynamic-memory - -This property describes the dynamically reconfigurable memory. It is a -property encoded array that has an integer N, the number of LMBs followed -by N LMB list entires. - -Each LMB list entry consists of the following elements: - -- Logical address of the start of the LMB encoded as a 64bit integer. This - corresponds to reg property in /memory node. -- DRC index of the LMB that corresponds to ibm,my-drc-index property - in a /memory node. -- Four bytes reserved for expansion. -- Associativity list index for the LMB that is used as an index into - ibm,associativity-lookup-arrays property described earlier. This - is used to retrieve the right associativity list to be used for this - LMB. -- A 32bit flags word. The bit at bit position 0x00000008 defines whether - the LMB is assigned to the the partition as of boot time. - -[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 |