hdrhistogram/serialization/interval_log/mod.rs
1//! Interval log parsing and writing.
2//!
3//! Interval logs, as handled by the Java implementation's `HistogramLogWriter`,
4//! `HistogramLogReader`, and `HistogramLogProcessor`, are a way to record a sequence of histograms
5//! over time. Suppose you were running a load test for an hour: you might want to record a
6//! histogram per second or minute so that you could correlate measurements with behavior you might
7//! see in logs, etc.
8//!
9//! An interval log contains some initial metadata, then a sequence of histograms, each with some
10//! additional metadata (timestamps, etc). See `IntervalLogHistogram`.
11//!
12//! The intervals in the log should be ordered by start timestamp. It's possible to write (and
13//! parse) logs with intervals in any order, but the expectation is that they will be sorted.
14//!
15//! To parse a log, see `IntervalLogIterator`. To write a log, see `IntervalLogWriterBuilder`.
16//!
17//! # Timestamps
18//!
19//! Each interval has a timestamp in seconds associated with it. However, it's not necessarily as
20//! simple as just interpreting the number as seconds since the epoch. There are two optional pieces
21//! of header metadata: "StartTime" and "BaseTime". Neither, one, or both of these may be present.
22//! It is possible to have multiple StartTime or BaseTime entries in the log, or even interleaved
23//! with interval histograms, but that is confusing, so this API prevents you from doing so.
24//!
25//! When BaseTime is present, per-interval timestamps are the number of seconds since BaseTime. When
26//! it is absent, the per-interval timestamps represent fractional seconds since the epoch (aka
27//! Unix time). BaseTime is useful because it allows you to have somewhat human-readable timestamps
28//! for each interval -- it's easier to see 245.3 and see that that's 4 minutes after the start of
29//! whatever it is you're doing than it is to mentally parse a Unix timestamp. Naturally, you can
30//! always calculate the deltas after the fact if you're willing to write a little tooling, but in
31//! some cases log files are consumed by humans too.
32//!
33//! While BaseTime is used to redefine per-interval timestamps for human readability, StartTime
34//! provides information about when the process that is generating the log reached some "start"
35//! condition. It's frequently the case that intervals will start some seconds after the process
36//! started, whether due to initial warmup before a benchmark or because it just takes a
37//! while to start up. If this is relevant to your workload, use StartTime to record the actual
38//! process start time (or other relevant "start" condition, like when a benchmark begins on an
39//! already long-running process). You could then use this when processing a log to more accurately
40//! plot interval data over time. Put differently, StartTime doesn't change the interpretation of
41//! interval timestamps directly like BaseTime; instead, it provides a hint to the consumer of the
42//! log that the "start" (whatever that means to you -- process start, etc) was at a different time
43//! than that associated with the first interval.
44//!
45//! #### Example scenario
46//!
47//! To explain their usage, suppose we're running a multi-hour benchmark on a process that starts
48//! up at a Unix time of 1500000000. We'll be recording separate log files per hour just to
49//! demonstrate the interaction between BaseTime, StartTime, and interval log timestamps.
50//!
51//! The process starts up, warms up its caches, JIT compiles, etc and is ready to start its
52//! benchmark 40 seconds later, so we create the first interval log file and record a StartTime of
53//! 1500000040. If the actual process start (rather than benchmark start) is more useful to you,
54//! using a StartTime of 1500000000 would be reasonable, but we'll stick with 1500000040.
55//!
56//! We'll use a BaseTime of 1500000040 because that's when the benchmark began, but 1500000000 would
57//! also be a reasonable choice here -- it would just make the per-interval deltas 40 seconds
58//! larger, which might be a more useful way of recording them, depending on the situation.
59//!
60//! The benchmark produces an interval histogram for each 60 seconds of workload, so the first one
61//! is ready at 1500000100, and is recorded with a delta timestamp of 60. This goes on for another
62//! hour, with the last one being 3540 seconds after the start of the benchmark with a corresponding
63//! delta of 3540.
64//!
65//! At the start of the second hour, the time is 1500003640 and the first log file is ended (which
66//! is straightforward: just stop writing to it and close the file) and the second log file is
67//! opened. It still uses a StartTime of 1500000040 because we want to represent that this log
68//! pertains to something that started an hour ago, but we'll use a BaseTime of 1500003640 so that
69//! our delta timestamps start over at 0. Again, you could just as well decide to use the same
70//! BaseTime of 1500000040 if you prefer, but it's arguably easier for a human to parse "h hours
71//! into the benchmark, s seconds past the hour" than it is to mentally divide by 3600, so we'll
72//! go with the new BaseTime.
73//!
74//! Suppose now you wanted to process these logs and show information (e.g. the 0.999th quantile of
75//! each interval) as the y-axis on a plot where time is the x-axis. You would want to have
76//! StartTime be the zero of the x-axis. For each interval, calculate the Unix time by adding its
77//! timestamp to BaseTime, where BaseTime is assumed to be zero if it's not specified. The point on
78//! the x-axis for that interval would be the result of subtracting StartTime from that Unix time.
79//! As an example, the 17th minute in the 2nd hour would have an interval timestamp of approximately
80//! 1020, which when added to 1500003640 is 1500004660. The difference between StartTime and then is
81//! 4620. You might choose to display that in units of minutes, which would be 4620 / 60 = 77.
82//!
83//! #### Java interop
84//!
85//! Since you may be interoperating with the Java reference implementation, here's a summary of the
86//! logic used by the `HistogramLogReader` class for StartTime and BaseTime. It's slightly different
87//! than what was described above, presumably for legacy compatibility reasons. This class stores
88//! the StartTime as a field which is exposed via a getter, and also integrates filtering intervals
89//! based on windows for either "absolute" (Unix time) timestamps or "offset" timestamps (delta vs
90//! the StartTime), so if you're using that filtering functionality, you need to understand how it
91//! ends up setting its internal version of StartTime.
92//!
93//! - Neither StartTime nor BaseTime are present: interval timestamps are interpreted as seconds
94//! since the epoch. The first interval's timestamp is stored to the StartTime field.
95//! - StartTime is present: StartTime is a number of seconds since epoch, and interval timestamps
96//! may be interpreted as deltas to be added to StartTime or as "absolute" Unix time depending on a
97//! heuristic. In other words, the heuristic chooses between setting the effective BaseTime to 0 or
98//! to StartTime. Specifically, the heuristic interprets interval timestamps as deltas if they are
99//! more than a year's worth of seconds smaller than StartTime and as absolute timestamps otherwise.
100//! - BaseTime is present: BaseTime is a number of seconds since epoch, and interval timestamps are
101//! interpreted as deltas. The first interval's (delta) timestamp is stored to the StartTime field.
102//! This is likely a bug, since StartTime should be an absolute timestamp, and appears to cause
103//! erroneous behavior when filtering by offset timestamps.
104//! - BaseTime and StartTime are present: The BaseTime is used like it is when it's the only one
105//! present: it's a number of seconds since epoch that serves as the starting point for the
106//! per-interval deltas to get a wall-clock time for each interval. No heuristics are applied to
107//! guess whether or not the intervals are absolute or deltas.
108//!
109//! The Java implementation also supports re-setting the StartTime and BaseTime if those entries
110//! exist more than once in the log. Suppose that you had an hour's worth of per-minute intervals,
111//! and then you set another StartTime to the current time and added a second hour's worth of
112//! intervals. Filtering for "all intervals between 5 and 10 minutes past the start" would result in
113//! two disjoint sequences of intervals, which is at the very least arguably unintuitive. However,
114//! you may well be working with log files that are structured that way, so this implementation's
115//! `IntervalLogIterator` will expose multiple StartTime, etc, entries as they appear in the log
116//! file.
117//!
118//! # Examples
119//!
120//! Parse a single interval from a log.
121//!
122//! ```
123//! use std::time;
124//! use hdrhistogram::serialization::interval_log;
125//!
126//! // two newline-separated log lines: a comment, then an interval
127//! let log = b"#I'm a comment\nTag=t,0.127,1.007,2.769,base64EncodedHisto\n";
128//!
129//! let mut iter = interval_log::IntervalLogIterator::new(&log[..]);
130//!
131//! // the comment is consumed and ignored by the parser, so the first event is an Interval
132//! match iter.next().unwrap() {
133//! Ok(interval_log::LogEntry::Interval(h)) => {
134//! assert_eq!(time::Duration::new(0, 127_000_000), h.start_timestamp());
135//! }
136//! _ => panic!()
137//! }
138//!
139//! // there are no more lines in the log; iteration complete
140//! assert_eq!(None, iter.next());
141//! ```
142//!
143//! Skip logs that started before 3 seconds.
144//!
145//! ```
146//! use hdrhistogram::serialization::interval_log;
147//!
148//! let log = "\
149//! #I'm a comment\n\
150//! Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
151//! 1.456,1.007,2.769,base64EncodedHisto\n\
152//! 3.789,1.007,2.769,base64EncodedHisto\n\
153//! Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
154//! 5.456,1.007,2.769,base64EncodedHisto\n\
155//! #Another comment\n"
156//! .as_bytes();
157//!
158//! let iter = interval_log::IntervalLogIterator::new(&log);
159//!
160//! let count = iter
161//! // only look at intervals (which are the only non-comment lines in this log)
162//! .filter_map(|e| match e {
163//! Ok(interval_log::LogEntry::Interval(ilh)) => Some(ilh),
164//! _ => None
165//! })
166//! // do any filtering you want
167//! .filter(|ilh| ilh.start_timestamp().as_secs() >= 3)
168//! .count();
169//!
170//! assert_eq!(3, count);
171//! ```
172//!
173//! Write a log.
174//!
175//! ```
176//! use std::{str, time};
177//! use hdrhistogram;
178//! use hdrhistogram::serialization;
179//! use hdrhistogram::serialization::interval_log;
180//!
181//! let mut buf = Vec::new();
182//! let mut serializer = serialization::V2Serializer::new();
183//!
184//! let mut h = hdrhistogram::Histogram::<u64>::new_with_bounds(
185//! 1, u64::max_value(), 3).unwrap();
186//! h.record(12345).unwrap();
187//!
188//! // limit scope of mutable borrow of `buf`
189//! {
190//! let now = time::SystemTime::now();
191//! let mut log_writer = interval_log::IntervalLogWriterBuilder::new()
192//! .add_comment("Comments are great")
193//! .with_start_time(now)
194//! .begin_log_with(&mut buf, &mut serializer)
195//! .unwrap();
196//!
197//! log_writer.write_comment(
198//! "You can have comments anywhere in the log").unwrap();
199//!
200//! log_writer
201//! .write_histogram(
202//! &h,
203//! now.elapsed().unwrap(),
204//! time::Duration::new(12, 345_678_901),
205//! interval_log::Tag::new("im-a-tag")
206//! )
207//! .unwrap();
208//! }
209//!
210//! // `buf` is now full of stuff; we check for the first line
211//! assert_eq!("#Comments are great\n", &str::from_utf8(&buf).unwrap()[0..20]);
212//! ```
213
214use std::cmp::Ordering;
215use std::error::Error;
216use std::fmt::Write;
217use std::str::FromStr;
218use std::{fmt, io, ops, str, time};
219
220use base64::Engine as _;
221use nom::branch::alt;
222use nom::bytes::complete::{tag, take, take_until, take_while1};
223use nom::character::complete::char;
224use nom::character::is_digit;
225use nom::combinator::{complete, map_res, opt, recognize};
226use nom::error::ErrorKind;
227use nom::number::complete::double;
228use nom::{Err, IResult};
229
230use super::super::{Counter, Histogram};
231use super::Serializer;
232
233/// Prepare an `IntervalLogWriter`.
234///
235/// This type only allows writing comments and headers. Once you're done writing those things, use
236/// `into_log_writer()` to convert this into an `IntervalLogWriter`.
237pub struct IntervalLogWriterBuilder {
238 comments: Vec<String>,
239 start_time: Option<f64>,
240 base_time: Option<f64>,
241 max_value_divisor: f64,
242}
243
244impl Default for IntervalLogWriterBuilder {
245 fn default() -> Self {
246 Self::new()
247 }
248}
249
250impl IntervalLogWriterBuilder {
251 /// Create a new log writer that writes to `writer` and serializes histograms with `serializer`.
252 pub fn new() -> IntervalLogWriterBuilder {
253 IntervalLogWriterBuilder {
254 comments: Vec::new(),
255 start_time: None,
256 base_time: None,
257 max_value_divisor: 1.0,
258 }
259 }
260
261 /// Add a comment line to be written when the writer is built.
262 ///
263 /// Comments containing '\n' will be transformed into multiple lines of comments.
264 pub fn add_comment(&mut self, s: &str) -> &mut Self {
265 self.comments.push(s.to_owned());
266 self
267 }
268
269 /// Set a StartTime. See the module-level documentation for more info.
270 ///
271 /// This can be called multiple times, but only the value for the most recent invocation will
272 /// be written.
273 pub fn with_start_time(&mut self, time: time::SystemTime) -> &mut Self {
274 self.start_time = Some(system_time_as_fp_seconds(time));
275 self
276 }
277
278 /// Set a BaseTime. See the module-level documentation for more info.
279 ///
280 /// This can be called multiple times, but only the value for the most recent invocation will
281 /// be written.
282 pub fn with_base_time(&mut self, time: time::SystemTime) -> &mut Self {
283 self.base_time = Some(system_time_as_fp_seconds(time));
284 self
285 }
286
287 /// Set a max value divisor.
288 ///
289 /// This is used to scale down the max value part of an interval log to something that may be
290 /// more human readable. The max value in the log is only for human consumption, so you might
291 /// prefer to divide by 10<sup>9</sup> to turn nanoseconds into fractional seconds, for
292 /// instance.
293 ///
294 /// If this is not set, 1.0 will be used.
295 ///
296 /// This can be called multiple times, but only the value for the most recent invocation will
297 /// be written.
298 pub fn with_max_value_divisor(&mut self, max_value_divisor: f64) -> &mut Self {
299 self.max_value_divisor = max_value_divisor;
300 self
301 }
302
303 /// Build a LogWriter and apply any configured headers.
304 #[allow(clippy::float_cmp)]
305 pub fn begin_log_with<'a, 'b, W: 'a + io::Write, S: 'b + Serializer>(
306 &self,
307 writer: &'a mut W,
308 serializer: &'b mut S,
309 ) -> Result<IntervalLogWriter<'a, 'b, W, S>, io::Error> {
310 let mut internal_writer = InternalLogWriter {
311 writer,
312 serializer,
313 text_buf: String::new(),
314 serialize_buf: Vec::new(),
315 max_value_divisor: self.max_value_divisor,
316 };
317
318 for c in &self.comments {
319 internal_writer.write_comment(&c)?;
320 }
321
322 if let Some(st) = self.start_time {
323 internal_writer.write_fmt(format_args!(
324 "#[StartTime: {:.3} (seconds since epoch)]\n",
325 st
326 ))?;
327 }
328
329 if let Some(bt) = self.base_time {
330 internal_writer.write_fmt(format_args!(
331 "#[BaseTime: {:.3} (seconds since epoch)]\n",
332 bt
333 ))?;
334 }
335
336 // The Java impl doesn't write a comment for this but it's confusing to silently modify the
337 // max value without leaving a trace
338 if self.max_value_divisor != 1.0_f64 {
339 internal_writer.write_fmt(format_args!(
340 "#[MaxValueDivisor: {:.3}]\n",
341 self.max_value_divisor
342 ))?;
343 }
344
345 Ok(IntervalLogWriter { internal_writer })
346 }
347}
348
349/// Writes interval histograms in an interval log.
350///
351/// This isn't created directly; start with an `IntervalLogWriterBuilder`. Once you've written the
352/// headers and ended up with an `IntervalLogWriter`, typical usage would be to write a histogram
353/// at regular intervals (e.g. once a second).
354///
355/// ```
356/// use hdrhistogram::serialization;
357/// use hdrhistogram::serialization::interval_log;
358///
359/// let mut buf = Vec::new();
360/// let mut serializer = serialization::V2Serializer::new();
361///
362/// // create a writer via a builder
363/// let mut writer = interval_log::IntervalLogWriterBuilder::new()
364/// .begin_log_with(&mut buf, &mut serializer)
365/// .unwrap();
366///
367/// writer.write_comment("Comment 2").unwrap();
368///
369/// // .. write some intervals
370/// ```
371pub struct IntervalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
372 internal_writer: InternalLogWriter<'a, 'b, W, S>,
373}
374
375impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> IntervalLogWriter<'a, 'b, W, S> {
376 /// Write a comment line.
377 ///
378 /// Comments containing '\n' will be transformed into multiple lines of comments.
379 pub fn write_comment(&mut self, s: &str) -> io::Result<()> {
380 self.internal_writer.write_comment(s)
381 }
382
383 /// Write an interval histogram.
384 ///
385 /// `start_timestamp` is the time since the epoch in seconds that measurements started being
386 /// recorded in this interval. If you're using a StartTime or BaseTime offset, you should
387 /// instead use a delta since that time. See the discussion about timestamps in the module-level
388 /// documentation.
389 ///
390 /// `duration` is the duration of the interval in seconds.
391 ///
392 /// `tag` is an optional tag for this histogram.
393 pub fn write_histogram<T: Counter>(
394 &mut self,
395 h: &Histogram<T>,
396 start_timestamp: time::Duration,
397 duration: time::Duration,
398 tag: Option<Tag>,
399 ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
400 self.internal_writer
401 .write_histogram(h, start_timestamp, duration, tag)
402 }
403}
404
405/// Errors that can occur while writing a log.
406#[derive(Debug)]
407pub enum IntervalLogWriterError<E> {
408 /// Histogram serialization failed.
409 SerializeError(E),
410 /// An i/o error occurred.
411 IoError(io::Error),
412}
413
414impl<E> From<io::Error> for IntervalLogWriterError<E> {
415 fn from(e: io::Error) -> Self {
416 IntervalLogWriterError::IoError(e)
417 }
418}
419
420impl<E: fmt::Display + fmt::Debug> fmt::Display for IntervalLogWriterError<E> {
421 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
422 match self {
423 IntervalLogWriterError::SerializeError(e) => {
424 write!(f, "Histogram serialization failed: {}", e)
425 }
426 IntervalLogWriterError::IoError(e) => write!(f, "An i/o error occurred: {}", e),
427 }
428 }
429}
430
431impl<E: Error + 'static> Error for IntervalLogWriterError<E> {
432 fn source(&self) -> Option<&(dyn Error + 'static)> {
433 match self {
434 IntervalLogWriterError::SerializeError(e) => Some(e),
435 IntervalLogWriterError::IoError(e) => Some(e),
436 }
437 }
438}
439
440/// Write interval logs.
441struct InternalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
442 writer: &'a mut W,
443 serializer: &'b mut S,
444 text_buf: String,
445 serialize_buf: Vec<u8>,
446 max_value_divisor: f64,
447}
448
449impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> InternalLogWriter<'a, 'b, W, S> {
450 fn write_fmt(&mut self, args: fmt::Arguments) -> io::Result<()> {
451 self.writer.write_fmt(args)
452 }
453
454 fn write_comment(&mut self, s: &str) -> io::Result<()> {
455 for l in s.split('\n') {
456 writeln!(self.writer, "#{}", l)?;
457 }
458
459 Ok(())
460 }
461
462 fn write_histogram<T: Counter>(
463 &mut self,
464 h: &Histogram<T>,
465 start_timestamp: time::Duration,
466 duration: time::Duration,
467 tag: Option<Tag>,
468 ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
469 self.serialize_buf.clear();
470 self.text_buf.clear();
471
472 if let Some(Tag(s)) = tag {
473 write!(self.text_buf, "Tag={},", &s).expect("Writes to a String can't fail");
474 }
475
476 write!(
477 self.writer,
478 "{}{:.3},{:.3},{:.3},",
479 self.text_buf,
480 duration_as_fp_seconds(start_timestamp),
481 duration_as_fp_seconds(duration),
482 h.max() as f64 / self.max_value_divisor // because the Java impl does it this way
483 )?;
484
485 self.text_buf.clear();
486 let _len = self
487 .serializer
488 .serialize(h, &mut self.serialize_buf)
489 .map_err(IntervalLogWriterError::SerializeError)?;
490 base64::engine::general_purpose::STANDARD
491 .encode_string(&self.serialize_buf, &mut self.text_buf);
492
493 self.writer.write_all(self.text_buf.as_bytes())?;
494 self.writer.write_all(b"\n")?;
495
496 Ok(())
497 }
498}
499
500/// A tag for an interval histogram.
501///
502/// Tags are just `str`s that do not contain a few disallowed characters: ',', '\r', '\n', and ' '.
503///
504/// To get the wrapped `str` back out, use `as_str()` or the `Deref<str>` implementation
505/// (`&some_tag`).
506#[derive(Debug, PartialEq, Clone, Copy)]
507pub struct Tag<'a>(&'a str);
508
509impl<'a> Tag<'a> {
510 /// Create a new Tag.
511 ///
512 /// If a disallowed character is present, this will return `None`.
513 pub fn new(s: &'a str) -> Option<Tag<'a>> {
514 if s.chars()
515 .any(|c| c == ',' || c == '\r' || c == '\n' || c == ' ')
516 {
517 None
518 } else {
519 Some(Tag(s))
520 }
521 }
522
523 /// Returns the tag contents as a str.
524 pub fn as_str(&self) -> &'a str {
525 self.0
526 }
527}
528
529impl<'a> ops::Deref for Tag<'a> {
530 type Target = str;
531
532 fn deref(&self) -> &Self::Target {
533 self.as_str()
534 }
535}
536
537/// An individual interval histogram parsed from an interval log.
538#[derive(PartialEq, Debug)]
539pub struct IntervalLogHistogram<'a> {
540 tag: Option<Tag<'a>>,
541 start_timestamp: time::Duration,
542 duration: time::Duration,
543 max: f64,
544 encoded_histogram: &'a str,
545}
546
547impl<'a> IntervalLogHistogram<'a> {
548 /// Tag, if any is present.
549 pub fn tag(&self) -> Option<Tag<'a>> {
550 self.tag
551 }
552
553 /// Timestamp of the start of the interval in seconds, expressed as a `Duration` relative to
554 /// some start point.
555 ///
556 /// The timestamp may be absolute vs the epoch, or there may be a `StartTime` or `BaseTime` for
557 /// the log, in which case you may wish to consider this number as a delta vs those timestamps.
558 /// See the module-level documentation about timestamps.
559 pub fn start_timestamp(&self) -> time::Duration {
560 self.start_timestamp
561 }
562
563 /// Duration of the interval in seconds.
564 pub fn duration(&self) -> time::Duration {
565 self.duration
566 }
567
568 /// Max value in the encoded histogram
569 ///
570 /// This max value is the max of the histogram divided by some scaling factor (which may be
571 /// 1.0).
572 pub fn max(&self) -> f64 {
573 self.max
574 }
575
576 /// Base64-encoded serialized histogram.
577 ///
578 /// If you need the deserialized histogram, base64-decode and use a `Deserializer` on the
579 /// resulting bytes.
580 pub fn encoded_histogram(&self) -> &'a str {
581 self.encoded_histogram
582 }
583}
584
585#[derive(PartialEq, Debug)]
586/// Represents one non-comment line in an interval log.
587///
588/// One thing to note is that the way your interval timestamps work can vary. If your log was
589/// written with a StartTime or BaseTime, that metadata will appear in header comments, and that
590/// will be represented by the iterator providing the corresponding variants here. The presence
591/// of those timestamps will affect how you should interpret the timestamps for individual
592/// intervals. See the module-level documentation.
593#[allow(variant_size_differences)]
594pub enum LogEntry<'a> {
595 /// Logs may include a StartTime. If present, it represents seconds since the epoch.
596 StartTime(time::Duration),
597 /// Logs may include a BaseTime. If present, it represents seconds since the epoch.
598 BaseTime(time::Duration),
599 /// An individual interval histogram.
600 Interval(IntervalLogHistogram<'a>),
601}
602
603/// Errors that occur when parsing an interval log.
604#[derive(Debug, PartialEq)]
605pub enum LogIteratorError {
606 /// Parsing failed
607 ParseError {
608 /// Offset in the input where the failed parse started
609 offset: usize,
610 },
611}
612
613/// Parse interval logs.
614///
615/// This iterator exposes each item (excluding comments and other information-free lines). See
616/// `LogEntry`.
617///
618/// Because histogram deserialization is deferred, parsing logs is fast. See the `interval_log`
619/// benchmark if you wish to see how it does on your hardware. As a baseline, parsing a log of 1000
620/// random histograms of 10,000 values each takes 2ms total on an E5-1650v3.
621///
622/// Deferring deserialization is handy because it allows you to cheaply navigate the log to find
623/// the records you care about (e.g. ones in a certain time range, or with a certain tag) without
624/// doing all the allocation, etc, of deserialization.
625///
626/// If you're looking for a direct port of the Java impl's `HistogramLogReader`, this isn't one: it
627/// won't deserialize for you, and it pushes the burden of figuring out what to do with StartTime,
628/// BaseTime, etc to you, and there aren't built in functions to filter by timestamp. On the other
629/// hand, because it doesn't do those things, it is much more flexible: you can easily build any
630/// sort of filtering you want, not just timestamp ranges, because you have cheap access to all the
631/// metadata before incurring the cost of deserialization. If you're not using any timestamp
632/// headers, or at least using them in straightforward ways, it is easy to accumulate the
633/// timestamp state you need. Since all the parsing is taken care of already, writing your own
634/// `HistogramLogReader` equivalent that fits the way your logs are assembled is just a couple of
635/// lines. (And if you're doing complex stuff, we probably wouldn't have built something that fits
636/// your quirky logs anyway!)
637///
638/// This parses from a slice representing the complete file because it made implementation easier
639/// (and also supports mmap'd files for maximum parsing speed). If parsing from a `Read` is
640/// important for your use case, open an issue about it.
641pub struct IntervalLogIterator<'a> {
642 orig_len: usize,
643 input: &'a [u8],
644 ended: bool,
645}
646
647impl<'a> IntervalLogIterator<'a> {
648 /// Create a new iterator from the UTF-8 bytes of an interval log.
649 pub fn new(input: &'a [u8]) -> IntervalLogIterator<'a> {
650 IntervalLogIterator {
651 orig_len: input.len(),
652 input,
653 ended: false,
654 }
655 }
656}
657
658impl<'a> Iterator for IntervalLogIterator<'a> {
659 type Item = Result<LogEntry<'a>, LogIteratorError>;
660
661 fn next(&mut self) -> Option<Self::Item> {
662 loop {
663 if self.ended {
664 return None;
665 }
666
667 if self.input.is_empty() {
668 self.ended = true;
669 return None;
670 }
671
672 // Look for magic comments first otherwise they will get matched by the simple comment
673 // parser
674 if let Ok((rest, e)) = log_entry(self.input) {
675 self.input = rest;
676 return Some(Ok(e));
677 }
678
679 // it wasn't a log entry; try parsing a comment
680 match ignored_line(self.input) {
681 Ok((rest, _)) => {
682 self.input = rest;
683 continue;
684 }
685 _ => {
686 self.ended = true;
687 return Some(Err(LogIteratorError::ParseError {
688 offset: self.orig_len - self.input.len(),
689 }));
690 }
691 }
692 }
693 }
694}
695
696fn duration_as_fp_seconds(d: time::Duration) -> f64 {
697 d.as_secs() as f64 + f64::from(d.subsec_nanos()) / 1_000_000_000_f64
698}
699
700fn system_time_as_fp_seconds(time: time::SystemTime) -> f64 {
701 match time.duration_since(time::UNIX_EPOCH) {
702 Ok(dur_after_epoch) => duration_as_fp_seconds(dur_after_epoch),
703 // Doesn't seem possible to be before the epoch, but using a negative number seems like
704 // a reasonable representation if it does occur
705 Err(t) => duration_as_fp_seconds(t.duration()) * -1_f64,
706 }
707}
708
709fn start_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
710 let (input, _) = tag("#[StartTime: ")(input)?;
711 let (input, duration) = fract_sec_duration(input)?;
712 let (input, _) = char(' ')(input)?;
713 let (input, _) = take_until("\n")(input)?;
714 let (input, _) = take(1_usize)(input)?;
715 Ok((input, LogEntry::StartTime(duration)))
716}
717
718fn base_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
719 let (input, _) = tag("#[BaseTime: ")(input)?;
720 let (input, duration) = fract_sec_duration(input)?;
721 let (input, _) = char(' ')(input)?;
722 let (input, _) = take_until("\n")(input)?;
723 let (input, _) = take(1_usize)(input)?;
724 Ok((input, LogEntry::BaseTime(duration)))
725}
726
727fn tag_bytes(input: &[u8]) -> IResult<&[u8], &[u8]> {
728 let (input, _) = tag("Tag=")(input)?;
729 let (input, tag) = take_until(",")(input)?;
730 let (input, _) = take(1_usize)(input)?;
731 Ok((input, tag))
732}
733
734fn tag_parser(input: &[u8]) -> IResult<&[u8], Tag> {
735 let (input, tag) = map_res(tag_bytes, str::from_utf8)(input)?;
736 Ok((input, Tag(tag)))
737}
738
739fn interval_hist(input: &[u8]) -> IResult<&[u8], LogEntry> {
740 let (input, tag) = opt(tag_parser)(input)?;
741 let (input, start_timestamp) = fract_sec_duration(input)?;
742 let (input, _) = char(',')(input)?;
743 let (input, duration) = fract_sec_duration(input)?;
744 let (input, _) = char(',')(input)?;
745 let (input, max) = double(input)?;
746 let (input, _) = char(',')(input)?;
747 let (input, encoded_histogram) = map_res(take_until("\n"), str::from_utf8)(input)?;
748 // Be nice to Windows users:
749 let encoded_histogram = encoded_histogram.trim_end_matches('\r');
750 let (input, _) = take(1_usize)(input)?;
751
752 Ok((
753 input,
754 LogEntry::Interval(IntervalLogHistogram {
755 tag,
756 start_timestamp,
757 duration,
758 max,
759 encoded_histogram,
760 }),
761 ))
762}
763
764fn log_entry(input: &[u8]) -> IResult<&[u8], LogEntry<'_>> {
765 complete(alt((start_time, base_time, interval_hist)))(input)
766}
767
768fn comment_line(input: &[u8]) -> IResult<&[u8], ()> {
769 let (input, _) = tag("#")(input)?;
770 let (input, _) = take_until("\n")(input)?;
771 let (input, _) = take(1_usize)(input)?;
772 Ok((input, ()))
773}
774
775fn legend(input: &[u8]) -> IResult<&[u8], ()> {
776 let (input, _) = tag("\"StartTimestamp\"")(input)?;
777 let (input, _) = take_until("\n")(input)?;
778 let (input, _) = take(1_usize)(input)?;
779 Ok((input, ()))
780}
781
782fn ignored_line(input: &[u8]) -> IResult<&[u8], ()> {
783 alt((comment_line, legend))(input)
784}
785
786fn fract_sec_duration(input: &[u8]) -> IResult<&[u8], time::Duration> {
787 let (rest, data) = fract_sec_tuple(input)?;
788 let (secs, nanos_str) = data;
789
790 // only read up to 9 digits since we can only support nanos, not smaller precision
791 let nanos_parse_res = match nanos_str.len().cmp(&9) {
792 Ordering::Greater => nanos_str[0..9].parse::<u32>(),
793 Ordering::Equal => nanos_str.parse::<u32>(),
794 Ordering::Less => nanos_str
795 .parse::<u32>()
796 // subtraction will not overflow because len is < 9
797 .map(|n| n * 10_u32.pow(9 - nanos_str.len() as u32)),
798 };
799
800 if let Ok(nanos) = nanos_parse_res {
801 return Ok((rest, time::Duration::new(secs, nanos)));
802 }
803
804 // nanos were invalid utf8. We don't expose these errors, so don't bother defining a
805 // custom error type.
806 Err(Err::Error(error_position!(input, ErrorKind::Alpha)))
807}
808
809type FResult<'a> = IResult<&'a [u8], (u64, &'a str)>;
810
811fn fract_sec_tuple(input: &[u8]) -> FResult {
812 let (input, secs) = map_res(
813 map_res(recognize(take_until(".")), str::from_utf8),
814 u64::from_str,
815 )(input)?;
816 let (input, _) = tag(".")(input)?;
817 let (input, nanos_str) = map_res(complete(take_while1(is_digit)), str::from_utf8)(input)?;
818 Ok((input, (secs, nanos_str)))
819}
820
821#[cfg(test)]
822mod tests;