summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/radosgw/troubleshooting.rst
blob: 3e4a0577768891b2f6e2c6095beb2b3a0784904c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
=================
 Troubleshooting
=================


The Gateway Won't Start
=======================

If you cannot start the gateway (i.e., there is no existing ``pid``), 
check to see if there is an existing ``.asok`` file from another 
user. If an ``.asok`` file from another user exists and there is no
running ``pid``, remove the ``.asok`` file and try to start the
process again.

This may occur when you start the process as a ``root`` user and 
the startup script is trying to start the process as a 
``www-data`` or ``apache`` user and an existing ``.asok`` is 
preventing the script from starting the daemon.

The radosgw init script (/etc/init.d/radosgw) also has a verbose argument that
can provide some insight as to what could be the issue:

  /etc/init.d/radosgw start -v

or

  /etc/init.d radosgw start --verbose

HTTP Request Errors
===================

Examining the access and error logs for the web server itself is
probably the first step in identifying what is going on.  If there is
a 500 error, that usually indicates a problem communicating with the
``radosgw`` daemon.  Ensure the daemon is running, its socket path is
configured, and that the web server is looking for it in the proper
location.


Crashed ``radosgw`` process
===========================

If the ``radosgw`` process dies, you will normally see a 500 error
from the web server (apache, nginx, etc.).  In that situation, simply
restarting radosgw will restore service.

To diagnose the cause of the crash, check the log in ``/var/log/ceph``
and/or the core file (if one was generated).


Blocked ``radosgw`` Requests
============================

If some (or all) radosgw requests appear to be blocked, you can get
some insight into the internal state of the ``radosgw`` daemon via
its admin socket.  By default, there will be a socket configured to
reside in ``/var/run/ceph``, and the daemon can be queried with::

 ceph daemon /var/run/ceph/client.rgw help
 
 help                list available commands
 objecter_requests   show in-progress osd requests
 perfcounters_dump   dump perfcounters value
 perfcounters_schema dump perfcounters schema
 version             get protocol version

Of particular interest::

 ceph daemon /var/run/ceph/client.rgw objecter_requests
 ...

will dump information about current in-progress requests with the
RADOS cluster.  This allows one to identify if any requests are blocked
by a non-responsive OSD.  For example, one might see::

  { "ops": [
        { "tid": 1858,
          "pg": "2.d2041a48",
          "osd": 1,
          "last_sent": "2012-03-08 14:56:37.949872",
          "attempts": 1,
          "object_id": "fatty_25647_object1857",
          "object_locator": "@2",
          "snapid": "head",
          "snap_context": "0=[]",
          "mtime": "2012-03-08 14:56:37.949813",
          "osd_ops": [
                "write 0~4096"]},
        { "tid": 1873,
          "pg": "2.695e9f8e",
          "osd": 1,
          "last_sent": "2012-03-08 14:56:37.970615",
          "attempts": 1,
          "object_id": "fatty_25647_object1872",
          "object_locator": "@2",
          "snapid": "head",
          "snap_context": "0=[]",
          "mtime": "2012-03-08 14:56:37.970555",
          "osd_ops": [
                "write 0~4096"]}],
  "linger_ops": [],
  "pool_ops": [],
  "pool_stat_ops": [],
  "statfs_ops": []}

In this dump, two requests are in progress.  The ``last_sent`` field is
the time the RADOS request was sent.  If this is a while ago, it suggests
that the OSD is not responding.  For example, for request 1858, you could
check the OSD status with::

 ceph pg map 2.d2041a48
 
 osdmap e9 pg 2.d2041a48 (2.0) -> up [1,0] acting [1,0]

This tells us to look at ``osd.1``, the primary copy for this PG::

 ceph daemon osd.1 ops
 { "num_ops": 651,
  "ops": [
        { "description": "osd_op(client.4124.0:1858 fatty_25647_object1857 [write 0~4096] 2.d2041a48)",
          "received_at": "1331247573.344650",
          "age": "25.606449",
          "flag_point": "waiting for sub ops",
          "client_info": { "client": "client.4124",
              "tid": 1858}},
 ...

The ``flag_point`` field indicates that the OSD is currently waiting
for replicas to respond, in this case ``osd.0``.


Java S3 API Troubleshooting
===========================


Peer Not Authenticated
----------------------

You may receive an error that looks like this:: 

     [java] INFO: Unable to execute HTTP request: peer not authenticated

The Java SDK for S3 requires a valid certificate from a recognized certificate
authority, because it uses HTTPS by default. If you are just testing the Ceph
Object Storage services, you can resolve this problem in a few ways:  

#. Prepend the IP address or hostname with ``http://``. For example, change this::

	conn.setEndpoint("myserver");

   To:: 

	conn.setEndpoint("http://myserver")

#. After setting your credentials, add a client configuration and set the 
   protocol to ``Protocol.HTTP``. :: 

			AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
			
			ClientConfiguration clientConfig = new ClientConfiguration();
			clientConfig.setProtocol(Protocol.HTTP);
			
			AmazonS3 conn = new AmazonS3Client(credentials, clientConfig);



405 MethodNotAllowed
--------------------

If you receive an 405 error, check to see if you have the S3 subdomain set up correctly. 
You will need to have a wild card setting in your DNS record for subdomain functionality
to work properly.

Also, check to ensure that the default site is disabled. ::

     [java] Exception in thread "main" Status Code: 405, AWS Service: Amazon S3, AWS Request ID: null, AWS Error Code: MethodNotAllowed, AWS Error Message: null, S3 Extended Request ID: null