"The trigger for this patch is the nodetool remove command, the expected
behaviour is that the error that return from the API would be populate to the
JMX client.
In this first stage the API errors are send as a generic RuntimeException, in a
later phase specific methods would catch this exception and replace it with a
specific one. Note that the thrown exceptions should be one that is known to the client,
or it would fail to handle it and in general should be equivalent to origin.
After this patch a call to the nodetool remove with a bad host name would result in:
./bin/nodetool removenode dcc0477a-b4a7-448a-bc79-27853f61b92d
error: Host ID not found.
-- StackTrace --
java.lang.RuntimeException: Host ID not found.
at com.cloudius.urchin.api.APIClient.getException(APIClient.java:114)
at com.cloudius.urchin.api.APIClient.post(APIClient.java:101)
....."
The API returns errors with an HTTP code 400, 404 or 500 depending on
the cause with a json object that contains the failure reason. The
error message should be populate to the JMX calling client, translated
to the appropriate exception.
This patch adds the ability to detect API failure and throw a runtime
exception with the returned message.
It is up to the calling method what to do with the exception, if it
would do nothing, the calling client would get a RuntimeException,
depends on origin MBean definition, the caller can catch the exception
and throw a specific kind.
Note that any exception that is thrown must be known to the JMX client
or it will not be able to process it.
As a first step, this patch replaces the jersey client to the newer
version under glassfish, which has an easy way of getting a Reply
object, and check its status before returning the results.
The only difference in the method that uses the APIClient is the use of
MultivaluedHashMap.
The following MBean implementation where changed
ColumnFamily
CompactionManager
Gossiper
EndpointSnitchInfo
CacheService
StorageProxy
StorageService
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
There is a confusion in origin, the MBean declare token as the parameter
to remove, but the implementation actually uses host id.
This patch modify scylla implementation to pass a host id as the
parameter to remove.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This series adds latency related depricated methods to the storage
proxy.
The implmenetation mimic origin, in which the depricated methods calls
the counters that replaces them."
In LatencyMetrics the URL is passed without the ending slash, this
patch use the same notation in ClientRequestMetrics.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This follow origin by adding the implementation for the depricated
metrics methods.
Similiar to origin, the implementation calls the implementation in the
metrics.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This series together with the cfhistogram series in scylla adds the
missing functionality so that nodetoold cfhistogram would work.
After both series will be apply an execution example is:
./bin/nodetool cfhistograms keyspace1 standard1
keyspace1/standard1 histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 6866.00 4866323.00 310 5
75% 0.00 8239.00 10090808.00 310 5
95% 0.00 20501.00 17436917.00 310 5
98% 0.00 35425.00 25109160.00 310 5
99% 0.00 51012.00 25109160.00 310 5
Min 0.00 2300.00 654950.00 259 5
Max 0.00 20924300.00 25109160.00 310 5"
This patch uses the estimated latency that was added to the column
family metrics to get the recent and estimated latency.
It follows the same logic as origin does to call the logic in metrics.
The following method implementation will be added:
getMemtableColumnsCount
getRecentSSTablesPerReadHistogram
getSSTablesPerReadHistogram
getLifetimeReadLatencyHistogramMicros
getRecentReadLatencyHistogramMicros
getRecentReadLatencyMicros
getLifetimeWriteLatencyHistogramMicros
getRecentWriteLatencyHistogramMicros
getRecentWriteLatencyMicros
getRangeCount
getTotalRangeLatencyMicros
getLifetimeRangeLatencyHistogramMicros
getRecentRangeLatencyHistogramMicros
getRecentRangeLatencyMicros
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This uses the recent estimated histogram and the API based estimated
histogram to support the sstable per read recent and total estimated
histogram.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the depricated total and recent estimated histogram.
It uses the new RecentEstimatedHistogram for the recent value and the
API based estimated histogram for the total latency.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The EstimatedHistogramWrapper is a helper class that holds the API
related data, so that a class that uses an EstimatedHistogram can
replace it with the wrapper and keep most of its code as is.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch modify the CacheEntry to support both String and
EstimatedHistogram.
It is possible to add more supported types in the future when needed.
In the APIClient, the cache will now support both String and
EstimatedHistogram in a similiar way.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
apiclient need to merge to cache
This patch allows to create an EstimatedHistogram from an array of data
value.
It will be used by the APIClient to return EstimatedHistogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Some of the jmx methods uses the notion of recent estimated histogram.
In origin the implementation uses an estimated histogram and clean the
histogram values on each call.
The RecentEstimatedHistogram mimic this behaviour, it store the latest
values of the last call. In each call new values are stored in the
histogram and the results is the delta between the last two calls.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The existing scylla-jmx code had reversed logic for the success of the repair:
it reported to "nodetool repair" a failure when the repair was successful :-)
Note that "nodetool repair" waits until a FINISHED notification, and then reports
a failure if it previously got any SESSION_FAILED notification; So if repair was
successful, all we need to do is to avoid sending a "SESSION_FAILED" message.
But we don't need to send any additional "SESSION_SUCCESS" message to signal
success. That message type is only used to report progress to the user (a
"session" is part of the repair work, so seeing sessions completing shows
progress), but because Scylla doesn't support this progress report yet, we
can't send these notifications yet, and there's no point in sending one such
message at the end - it's only confusing (especially when the text is the same
as that of the FINISHED message).
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
"Some of the nodetool commands calls multiple time to the same methods,
specifically nodetool status and ring calls the endpoint snitch get rack
and datacenter many times during the same command. It result in a long
execution time.
This series adds the ability to cache results when needed. Typically
cached result will be for a few seconds, there is no need to a clean up
mechanism as we don't expect there will be too much information in the
cache.
Currently the only cached results are for the endpoint snitch but it is
easy to add more if we see that it is needed."
This cache the get data center and get rack results for 10s, it has a
direct impact on nodetool status and nodetool ring
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Some operations are not changing frequently and are called multiple time
during a nodetool execution.
This patch adds the ability to cache results for a define period of time
(typically it will be for a few seconds) so that during the same
nodetool command call, the results will be retrieved from the cache.
It is currently only implemented for string values, other commands will
be added when needed.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This series complete the scylla series to support the nodetool cfstatus support.
After this series it will be possible to call nodetoold cfstatus and get a meaningfull output.
An output example:
./bin/nodetool cfstats keyspace1
Keyspace: keyspace1
Read Count: 87657
Read Latency: 1.1418900715287998 ms.
Write Count: 87177
Write Latency: 0.022303761313190406 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 8
SSTables in each level: [ Space used (live): 92356832
Space used (total): 92356832
Space used by snapshots (total): 0
Off heap memory used (total): 106430512
SSTable Compression Ratio: 0.0
Number of keys (estimate): 328672
Memtable cell count: 100000
Memtable data size: 84800254
Memtable off heap memory used: 105906176
Memtable switch count: 4
Local read count: 92854
Local read latency: 1.039 ms
Local write count: 93880
Local write latency: 1.045 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 208416
Bloom filter off heap memory used: 524336
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 259
Compacted partition maximum bytes: 310"
Amnon Heiman (1):
EndPointSnitchInfo: Support null as host name
Asias He (1):
rpm: Fix build in centos container
Takuya ASADA (2):
dist: check hostname is resolvable
dist: does not need maven for running time
The logic of that timer, should be that after some defined time from the
previous request a new one will be sent, the actuall rate is
meaningless and only cause delyed request to be sent in a higher
frequency after the delayed reponse was returned.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This combine the different constructor into one constructor with the
logic and another one that calls it with a default value.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The use of mean and variance as histogram parameter names makes more
sense.
This also make it safe to use an empty histogram that holds no samples.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
In origin an undocumented feature (bug?) is that passing null as a host
name for getRack and getDatacenter returns the rack or datacenter
according to the loopbck address.
This follow the same behaviour, so when the host is null, the function
will not fail but will call the API with the local loopback address
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This set the rmi port to listen on the jmx port, so the java will not
choose a random port for it.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>