hdrhistogram/serialization/interval_log/
mod.rs

1//! Interval log parsing and writing.
2//!
3//! Interval logs, as handled by the Java implementation's `HistogramLogWriter`,
4//! `HistogramLogReader`, and `HistogramLogProcessor`, are a way to record a sequence of histograms
5//! over time. Suppose you were running a load test for an hour: you might want to record a
6//! histogram per second or minute so that you could correlate measurements with behavior you might
7//! see in logs, etc.
8//!
9//! An interval log contains some initial metadata, then a sequence of histograms, each with some
10//! additional metadata (timestamps, etc). See `IntervalLogHistogram`.
11//!
12//! The intervals in the log should be ordered by start timestamp. It's possible to write (and
13//! parse) logs with intervals in any order, but the expectation is that they will be sorted.
14//!
15//! To parse a log, see `IntervalLogIterator`. To write a log, see `IntervalLogWriterBuilder`.
16//!
17//! # Timestamps
18//!
19//! Each interval has a timestamp in seconds associated with it. However, it's not necessarily as
20//! simple as just interpreting the number as seconds since the epoch. There are two optional pieces
21//! of header metadata: "StartTime" and "BaseTime". Neither, one, or both of these may be present.
22//! It is possible to have multiple StartTime or BaseTime entries in the log, or even interleaved
23//! with interval histograms, but that is confusing, so this API prevents you from doing so.
24//!
25//! When BaseTime is present, per-interval timestamps are the number of seconds since BaseTime. When
26//! it is absent, the per-interval timestamps represent fractional seconds since the epoch (aka
27//! Unix time). BaseTime is useful because it allows you to have somewhat human-readable timestamps
28//! for each interval -- it's easier to see 245.3 and see that that's 4 minutes after the start of
29//! whatever it is you're doing than it is to mentally parse a Unix timestamp. Naturally, you can
30//! always calculate the deltas after the fact if you're willing to write a little tooling, but in
31//! some cases log files are consumed by humans too.
32//!
33//! While BaseTime is used to redefine per-interval timestamps for human readability, StartTime
34//! provides information about when the process that is generating the log reached some "start"
35//! condition. It's frequently the case that intervals will start some seconds after the process
36//! started, whether due to initial warmup before a benchmark or because it just takes a
37//! while to start up. If this is relevant to your workload, use StartTime to record the actual
38//! process start time (or other relevant "start" condition, like when a benchmark begins on an
39//! already long-running process). You could then use this when processing a log to more accurately
40//! plot interval data over time. Put differently, StartTime doesn't change the interpretation of
41//! interval timestamps directly like BaseTime; instead, it provides a hint to the consumer of the
42//! log that the "start" (whatever that means to you -- process start, etc) was at a different time
43//! than that associated with the first interval.
44//!
45//! #### Example scenario
46//!
47//! To explain their usage, suppose we're running a multi-hour benchmark on a process that starts
48//! up at a Unix time of 1500000000. We'll be recording separate log files per hour just to
49//! demonstrate the interaction between BaseTime, StartTime, and interval log timestamps.
50//!
51//! The process starts up, warms up its caches, JIT compiles, etc and is ready to start its
52//! benchmark 40 seconds later, so we create the first interval log file and record a StartTime of
53//! 1500000040. If the actual process start (rather than benchmark start) is more useful to you,
54//! using a StartTime of 1500000000 would be reasonable, but we'll stick with 1500000040.
55//!
56//! We'll use a BaseTime of 1500000040 because that's when the benchmark began, but 1500000000 would
57//! also be a reasonable choice here -- it would just make the per-interval deltas 40 seconds
58//! larger, which might be a more useful way of recording them, depending on the situation.
59//!
60//! The benchmark produces an interval histogram for each 60 seconds of workload, so the first one
61//! is ready at 1500000100, and is recorded with a delta timestamp of 60. This goes on for another
62//! hour, with the last one being 3540 seconds after the start of the benchmark with a corresponding
63//! delta of 3540.
64//!
65//! At the start of the second hour, the time is 1500003640 and the first log file is ended (which
66//! is straightforward: just stop writing to it and close the file) and the second log file is
67//! opened. It still uses a StartTime of 1500000040 because we want to represent that this log
68//! pertains to something that started an hour ago, but we'll use a BaseTime of 1500003640 so that
69//! our delta timestamps start over at 0. Again, you could just as well decide to use the same
70//! BaseTime of 1500000040 if you prefer, but it's arguably easier for a human to parse "h hours
71//! into the benchmark, s seconds past the hour" than it is to mentally divide by 3600, so we'll
72//! go with the new BaseTime.
73//!
74//! Suppose now you wanted to process these logs and show information (e.g. the 0.999th quantile of
75//! each interval) as the y-axis on a plot where time is the x-axis. You would want to have
76//! StartTime be the zero of the x-axis. For each interval, calculate the Unix time by adding its
77//! timestamp to BaseTime, where BaseTime is assumed to be zero if it's not specified. The point on
78//! the x-axis for that interval would be the result of subtracting StartTime from that Unix time.
79//! As an example, the 17th minute in the 2nd hour would have an interval timestamp of approximately
80//! 1020, which when added to 1500003640 is 1500004660. The difference between StartTime and then is
81//! 4620. You might choose to display that in units of minutes, which would be 4620 / 60 = 77.
82//!
83//! #### Java interop
84//!
85//! Since you may be interoperating with the Java reference implementation, here's a summary of the
86//! logic used by the `HistogramLogReader` class for StartTime and BaseTime. It's slightly different
87//! than what was described above, presumably for legacy compatibility reasons. This class stores
88//! the StartTime as a field which is exposed via a getter, and also integrates filtering intervals
89//! based on windows for either "absolute" (Unix time) timestamps or "offset" timestamps (delta vs
90//! the StartTime), so if you're using that filtering functionality, you need to understand how it
91//! ends up setting its internal version of StartTime.
92//!
93//! - Neither StartTime nor BaseTime are present: interval timestamps are interpreted as seconds
94//! since the epoch. The first interval's timestamp is stored to the StartTime field.
95//! - StartTime is present: StartTime is a number of seconds since epoch, and interval timestamps
96//! may be interpreted as deltas to be added to StartTime or as "absolute" Unix time depending on a
97//! heuristic. In other words, the heuristic chooses between setting the effective BaseTime to 0 or
98//! to StartTime. Specifically, the heuristic interprets interval timestamps as deltas if they are
99//! more than a year's worth of seconds smaller than StartTime and as absolute timestamps otherwise.
100//! - BaseTime is present: BaseTime is a number of seconds since epoch, and interval timestamps are
101//! interpreted as deltas. The first interval's (delta) timestamp is stored to the StartTime field.
102//! This is likely a bug, since StartTime should be an absolute timestamp, and appears to cause
103//! erroneous behavior when filtering by offset timestamps.
104//! - BaseTime and StartTime are present: The BaseTime is used like it is when it's the only one
105//! present: it's a number of seconds since epoch that serves as the starting point for the
106//! per-interval deltas to get a wall-clock time for each interval. No heuristics are applied to
107//! guess whether or not the intervals are absolute or deltas.
108//!
109//! The Java implementation also supports re-setting the StartTime and BaseTime if those entries
110//! exist more than once in the log. Suppose that you had an hour's worth of per-minute intervals,
111//! and then you set another StartTime to the current time and added a second hour's worth of
112//! intervals. Filtering for "all intervals between 5 and 10 minutes past the start" would result in
113//! two disjoint sequences of intervals, which is at the very least arguably unintuitive. However,
114//! you may well be working with log files that are structured that way, so this implementation's
115//! `IntervalLogIterator` will expose multiple StartTime, etc, entries as they appear in the log
116//! file.
117//!
118//! # Examples
119//!
120//! Parse a single interval from a log.
121//!
122//! ```
123//! use std::time;
124//! use hdrhistogram::serialization::interval_log;
125//!
126//! // two newline-separated log lines: a comment, then an interval
127//! let log = b"#I'm a comment\nTag=t,0.127,1.007,2.769,base64EncodedHisto\n";
128//!
129//! let mut iter = interval_log::IntervalLogIterator::new(&log[..]);
130//!
131//! // the comment is consumed and ignored by the parser, so the first event is an Interval
132//! match iter.next().unwrap() {
133//!     Ok(interval_log::LogEntry::Interval(h)) => {
134//!         assert_eq!(time::Duration::new(0, 127_000_000), h.start_timestamp());
135//!     }
136//!     _ => panic!()
137//! }
138//!
139//! // there are no more lines in the log; iteration complete
140//! assert_eq!(None, iter.next());
141//! ```
142//!
143//! Skip logs that started before 3 seconds.
144//!
145//! ```
146//! use hdrhistogram::serialization::interval_log;
147//!
148//! let log = "\
149//!     #I'm a comment\n\
150//!     Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
151//!     1.456,1.007,2.769,base64EncodedHisto\n\
152//!     3.789,1.007,2.769,base64EncodedHisto\n\
153//!     Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
154//!     5.456,1.007,2.769,base64EncodedHisto\n\
155//!     #Another comment\n"
156//! .as_bytes();
157//!
158//! let iter = interval_log::IntervalLogIterator::new(&log);
159//!
160//! let count = iter
161//!     // only look at intervals (which are the only non-comment lines in this log)
162//!     .filter_map(|e| match e {
163//!         Ok(interval_log::LogEntry::Interval(ilh)) => Some(ilh),
164//!          _ => None
165//!     })
166//!     // do any filtering you want
167//!     .filter(|ilh| ilh.start_timestamp().as_secs() >= 3)
168//!     .count();
169//!
170//! assert_eq!(3, count);
171//! ```
172//!
173//! Write a log.
174//!
175//! ```
176//! use std::{str, time};
177//! use hdrhistogram;
178//! use hdrhistogram::serialization;
179//! use hdrhistogram::serialization::interval_log;
180//!
181//! let mut buf = Vec::new();
182//! let mut serializer = serialization::V2Serializer::new();
183//!
184//! let mut h = hdrhistogram::Histogram::<u64>::new_with_bounds(
185//!     1, u64::max_value(), 3).unwrap();
186//! h.record(12345).unwrap();
187//!
188//! // limit scope of mutable borrow of `buf`
189//! {
190//!     let now = time::SystemTime::now();
191//!     let mut log_writer = interval_log::IntervalLogWriterBuilder::new()
192//!         .add_comment("Comments are great")
193//!         .with_start_time(now)
194//!         .begin_log_with(&mut buf, &mut serializer)
195//!         .unwrap();
196//!
197//!     log_writer.write_comment(
198//!         "You can have comments anywhere in the log").unwrap();
199//!
200//!     log_writer
201//!         .write_histogram(
202//!             &h,
203//!             now.elapsed().unwrap(),
204//!             time::Duration::new(12, 345_678_901),
205//!             interval_log::Tag::new("im-a-tag")
206//!         )
207//!         .unwrap();
208//! }
209//!
210//! // `buf` is now full of stuff; we check for the first line
211//! assert_eq!("#Comments are great\n", &str::from_utf8(&buf).unwrap()[0..20]);
212//! ```
213
214use std::cmp::Ordering;
215use std::error::Error;
216use std::fmt::Write;
217use std::str::FromStr;
218use std::{fmt, io, ops, str, time};
219
220use base64::Engine as _;
221use nom::branch::alt;
222use nom::bytes::complete::{tag, take, take_until, take_while1};
223use nom::character::complete::char;
224use nom::character::is_digit;
225use nom::combinator::{complete, map_res, opt, recognize};
226use nom::error::ErrorKind;
227use nom::number::complete::double;
228use nom::{Err, IResult};
229
230use super::super::{Counter, Histogram};
231use super::Serializer;
232
233/// Prepare an `IntervalLogWriter`.
234///
235/// This type only allows writing comments and headers. Once you're done writing those things, use
236/// `into_log_writer()` to convert this into an `IntervalLogWriter`.
237pub struct IntervalLogWriterBuilder {
238    comments: Vec<String>,
239    start_time: Option<f64>,
240    base_time: Option<f64>,
241    max_value_divisor: f64,
242}
243
244impl Default for IntervalLogWriterBuilder {
245    fn default() -> Self {
246        Self::new()
247    }
248}
249
250impl IntervalLogWriterBuilder {
251    /// Create a new log writer that writes to `writer` and serializes histograms with `serializer`.
252    pub fn new() -> IntervalLogWriterBuilder {
253        IntervalLogWriterBuilder {
254            comments: Vec::new(),
255            start_time: None,
256            base_time: None,
257            max_value_divisor: 1.0,
258        }
259    }
260
261    /// Add a comment line to be written when the writer is built.
262    ///
263    /// Comments containing '\n' will be transformed into multiple lines of comments.
264    pub fn add_comment(&mut self, s: &str) -> &mut Self {
265        self.comments.push(s.to_owned());
266        self
267    }
268
269    /// Set a StartTime. See the module-level documentation for more info.
270    ///
271    /// This can be called multiple times, but only the value for the most recent invocation will
272    /// be written.
273    pub fn with_start_time(&mut self, time: time::SystemTime) -> &mut Self {
274        self.start_time = Some(system_time_as_fp_seconds(time));
275        self
276    }
277
278    /// Set a BaseTime. See the module-level documentation for more info.
279    ///
280    /// This can be called multiple times, but only the value for the most recent invocation will
281    /// be written.
282    pub fn with_base_time(&mut self, time: time::SystemTime) -> &mut Self {
283        self.base_time = Some(system_time_as_fp_seconds(time));
284        self
285    }
286
287    /// Set a max value divisor.
288    ///
289    /// This is used to scale down the max value part of an interval log to something that may be
290    /// more human readable. The max value in the log is only for human consumption, so you might
291    /// prefer to divide by 10<sup>9</sup> to turn nanoseconds into fractional seconds, for
292    /// instance.
293    ///
294    /// If this is not set, 1.0 will be used.
295    ///
296    /// This can be called multiple times, but only the value for the most recent invocation will
297    /// be written.
298    pub fn with_max_value_divisor(&mut self, max_value_divisor: f64) -> &mut Self {
299        self.max_value_divisor = max_value_divisor;
300        self
301    }
302
303    /// Build a LogWriter and apply any configured headers.
304    #[allow(clippy::float_cmp)]
305    pub fn begin_log_with<'a, 'b, W: 'a + io::Write, S: 'b + Serializer>(
306        &self,
307        writer: &'a mut W,
308        serializer: &'b mut S,
309    ) -> Result<IntervalLogWriter<'a, 'b, W, S>, io::Error> {
310        let mut internal_writer = InternalLogWriter {
311            writer,
312            serializer,
313            text_buf: String::new(),
314            serialize_buf: Vec::new(),
315            max_value_divisor: self.max_value_divisor,
316        };
317
318        for c in &self.comments {
319            internal_writer.write_comment(&c)?;
320        }
321
322        if let Some(st) = self.start_time {
323            internal_writer.write_fmt(format_args!(
324                "#[StartTime: {:.3} (seconds since epoch)]\n",
325                st
326            ))?;
327        }
328
329        if let Some(bt) = self.base_time {
330            internal_writer.write_fmt(format_args!(
331                "#[BaseTime: {:.3} (seconds since epoch)]\n",
332                bt
333            ))?;
334        }
335
336        // The Java impl doesn't write a comment for this but it's confusing to silently modify the
337        // max value without leaving a trace
338        if self.max_value_divisor != 1.0_f64 {
339            internal_writer.write_fmt(format_args!(
340                "#[MaxValueDivisor: {:.3}]\n",
341                self.max_value_divisor
342            ))?;
343        }
344
345        Ok(IntervalLogWriter { internal_writer })
346    }
347}
348
349/// Writes interval histograms in an interval log.
350///
351/// This isn't created directly; start with an `IntervalLogWriterBuilder`. Once you've written the
352/// headers and ended up with an `IntervalLogWriter`, typical usage would be to write a histogram
353/// at regular intervals (e.g. once a second).
354///
355/// ```
356/// use hdrhistogram::serialization;
357/// use hdrhistogram::serialization::interval_log;
358///
359/// let mut buf = Vec::new();
360/// let mut serializer = serialization::V2Serializer::new();
361///
362/// // create a writer via a builder
363/// let mut writer = interval_log::IntervalLogWriterBuilder::new()
364///     .begin_log_with(&mut buf, &mut serializer)
365///     .unwrap();
366///
367/// writer.write_comment("Comment 2").unwrap();
368///
369/// // .. write some intervals
370/// ```
371pub struct IntervalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
372    internal_writer: InternalLogWriter<'a, 'b, W, S>,
373}
374
375impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> IntervalLogWriter<'a, 'b, W, S> {
376    /// Write a comment line.
377    ///
378    /// Comments containing '\n' will be transformed into multiple lines of comments.
379    pub fn write_comment(&mut self, s: &str) -> io::Result<()> {
380        self.internal_writer.write_comment(s)
381    }
382
383    /// Write an interval histogram.
384    ///
385    /// `start_timestamp` is the time since the epoch in seconds that measurements started being
386    /// recorded in this interval. If you're using a StartTime or BaseTime offset, you should
387    /// instead use a delta since that time. See the discussion about timestamps in the module-level
388    /// documentation.
389    ///
390    /// `duration` is the duration of the interval in seconds.
391    ///
392    /// `tag` is an optional tag for this histogram.
393    pub fn write_histogram<T: Counter>(
394        &mut self,
395        h: &Histogram<T>,
396        start_timestamp: time::Duration,
397        duration: time::Duration,
398        tag: Option<Tag>,
399    ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
400        self.internal_writer
401            .write_histogram(h, start_timestamp, duration, tag)
402    }
403}
404
405/// Errors that can occur while writing a log.
406#[derive(Debug)]
407pub enum IntervalLogWriterError<E> {
408    /// Histogram serialization failed.
409    SerializeError(E),
410    /// An i/o error occurred.
411    IoError(io::Error),
412}
413
414impl<E> From<io::Error> for IntervalLogWriterError<E> {
415    fn from(e: io::Error) -> Self {
416        IntervalLogWriterError::IoError(e)
417    }
418}
419
420impl<E: fmt::Display + fmt::Debug> fmt::Display for IntervalLogWriterError<E> {
421    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
422        match self {
423            IntervalLogWriterError::SerializeError(e) => {
424                write!(f, "Histogram serialization failed: {}", e)
425            }
426            IntervalLogWriterError::IoError(e) => write!(f, "An i/o error occurred: {}", e),
427        }
428    }
429}
430
431impl<E: Error + 'static> Error for IntervalLogWriterError<E> {
432    fn source(&self) -> Option<&(dyn Error + 'static)> {
433        match self {
434            IntervalLogWriterError::SerializeError(e) => Some(e),
435            IntervalLogWriterError::IoError(e) => Some(e),
436        }
437    }
438}
439
440/// Write interval logs.
441struct InternalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
442    writer: &'a mut W,
443    serializer: &'b mut S,
444    text_buf: String,
445    serialize_buf: Vec<u8>,
446    max_value_divisor: f64,
447}
448
449impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> InternalLogWriter<'a, 'b, W, S> {
450    fn write_fmt(&mut self, args: fmt::Arguments) -> io::Result<()> {
451        self.writer.write_fmt(args)
452    }
453
454    fn write_comment(&mut self, s: &str) -> io::Result<()> {
455        for l in s.split('\n') {
456            writeln!(self.writer, "#{}", l)?;
457        }
458
459        Ok(())
460    }
461
462    fn write_histogram<T: Counter>(
463        &mut self,
464        h: &Histogram<T>,
465        start_timestamp: time::Duration,
466        duration: time::Duration,
467        tag: Option<Tag>,
468    ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
469        self.serialize_buf.clear();
470        self.text_buf.clear();
471
472        if let Some(Tag(s)) = tag {
473            write!(self.text_buf, "Tag={},", &s).expect("Writes to a String can't fail");
474        }
475
476        write!(
477            self.writer,
478            "{}{:.3},{:.3},{:.3},",
479            self.text_buf,
480            duration_as_fp_seconds(start_timestamp),
481            duration_as_fp_seconds(duration),
482            h.max() as f64 / self.max_value_divisor // because the Java impl does it this way
483        )?;
484
485        self.text_buf.clear();
486        let _len = self
487            .serializer
488            .serialize(h, &mut self.serialize_buf)
489            .map_err(IntervalLogWriterError::SerializeError)?;
490        base64::engine::general_purpose::STANDARD
491            .encode_string(&self.serialize_buf, &mut self.text_buf);
492
493        self.writer.write_all(self.text_buf.as_bytes())?;
494        self.writer.write_all(b"\n")?;
495
496        Ok(())
497    }
498}
499
500/// A tag for an interval histogram.
501///
502/// Tags are just `str`s that do not contain a few disallowed characters: ',', '\r', '\n', and ' '.
503///
504/// To get the wrapped `str` back out, use `as_str()` or the `Deref<str>` implementation
505/// (`&some_tag`).
506#[derive(Debug, PartialEq, Clone, Copy)]
507pub struct Tag<'a>(&'a str);
508
509impl<'a> Tag<'a> {
510    /// Create a new Tag.
511    ///
512    /// If a disallowed character is present, this will return `None`.
513    pub fn new(s: &'a str) -> Option<Tag<'a>> {
514        if s.chars()
515            .any(|c| c == ',' || c == '\r' || c == '\n' || c == ' ')
516        {
517            None
518        } else {
519            Some(Tag(s))
520        }
521    }
522
523    /// Returns the tag contents as a str.
524    pub fn as_str(&self) -> &'a str {
525        self.0
526    }
527}
528
529impl<'a> ops::Deref for Tag<'a> {
530    type Target = str;
531
532    fn deref(&self) -> &Self::Target {
533        self.as_str()
534    }
535}
536
537/// An individual interval histogram parsed from an interval log.
538#[derive(PartialEq, Debug)]
539pub struct IntervalLogHistogram<'a> {
540    tag: Option<Tag<'a>>,
541    start_timestamp: time::Duration,
542    duration: time::Duration,
543    max: f64,
544    encoded_histogram: &'a str,
545}
546
547impl<'a> IntervalLogHistogram<'a> {
548    /// Tag, if any is present.
549    pub fn tag(&self) -> Option<Tag<'a>> {
550        self.tag
551    }
552
553    /// Timestamp of the start of the interval in seconds, expressed as a `Duration` relative to
554    /// some start point.
555    ///
556    /// The timestamp may be absolute vs the epoch, or there may be a `StartTime` or `BaseTime` for
557    /// the log, in which case you may wish to consider this number as a delta vs those timestamps.
558    /// See the module-level documentation about timestamps.
559    pub fn start_timestamp(&self) -> time::Duration {
560        self.start_timestamp
561    }
562
563    /// Duration of the interval in seconds.
564    pub fn duration(&self) -> time::Duration {
565        self.duration
566    }
567
568    /// Max value in the encoded histogram
569    ///
570    /// This max value is the max of the histogram divided by some scaling factor (which may be
571    /// 1.0).
572    pub fn max(&self) -> f64 {
573        self.max
574    }
575
576    /// Base64-encoded serialized histogram.
577    ///
578    /// If you need the deserialized histogram, base64-decode and use a `Deserializer` on the
579    /// resulting bytes.
580    pub fn encoded_histogram(&self) -> &'a str {
581        self.encoded_histogram
582    }
583}
584
585#[derive(PartialEq, Debug)]
586/// Represents one non-comment line in an interval log.
587///
588/// One thing to note is that the way your interval timestamps work can vary. If your log was
589/// written with a StartTime or BaseTime, that metadata will appear in header comments, and that
590/// will be represented by the iterator providing the corresponding variants here. The presence
591/// of those timestamps will affect how you should interpret the timestamps for individual
592/// intervals. See the module-level documentation.
593#[allow(variant_size_differences)]
594pub enum LogEntry<'a> {
595    /// Logs may include a StartTime. If present, it represents seconds since the epoch.
596    StartTime(time::Duration),
597    /// Logs may include a BaseTime. If present, it represents seconds since the epoch.
598    BaseTime(time::Duration),
599    /// An individual interval histogram.
600    Interval(IntervalLogHistogram<'a>),
601}
602
603/// Errors that occur when parsing an interval log.
604#[derive(Debug, PartialEq)]
605pub enum LogIteratorError {
606    /// Parsing failed
607    ParseError {
608        /// Offset in the input where the failed parse started
609        offset: usize,
610    },
611}
612
613/// Parse interval logs.
614///
615/// This iterator exposes each item (excluding comments and other information-free lines). See
616/// `LogEntry`.
617///
618/// Because histogram deserialization is deferred, parsing logs is fast. See the `interval_log`
619/// benchmark if you wish to see how it does on your hardware. As a baseline, parsing a log of 1000
620/// random histograms of 10,000 values each takes 2ms total on an E5-1650v3.
621///
622/// Deferring deserialization is handy because it allows you to cheaply navigate the log to find
623/// the records you care about (e.g. ones in a certain time range, or with a certain tag) without
624/// doing all the allocation, etc, of deserialization.
625///
626/// If you're looking for a direct port of the Java impl's `HistogramLogReader`, this isn't one: it
627/// won't deserialize for you, and it pushes the burden of figuring out what to do with StartTime,
628/// BaseTime, etc to you, and there aren't built in functions to filter by timestamp. On the other
629/// hand, because it doesn't do those things, it is much more flexible: you can easily build any
630/// sort of filtering you want, not just timestamp ranges, because you have cheap access to all the
631/// metadata before incurring the cost of deserialization. If you're not using any timestamp
632/// headers, or at least using them in straightforward ways, it is easy to accumulate the
633/// timestamp state you need. Since all the parsing is taken care of already, writing your own
634/// `HistogramLogReader` equivalent that fits the way your logs are assembled is just a couple of
635/// lines. (And if you're doing complex stuff, we probably wouldn't have built something that fits
636/// your quirky logs anyway!)
637///
638/// This parses from a slice representing the complete file because it made implementation easier
639/// (and also supports mmap'd files for maximum parsing speed). If parsing from a `Read` is
640/// important for your use case, open an issue about it.
641pub struct IntervalLogIterator<'a> {
642    orig_len: usize,
643    input: &'a [u8],
644    ended: bool,
645}
646
647impl<'a> IntervalLogIterator<'a> {
648    /// Create a new iterator from the UTF-8 bytes of an interval log.
649    pub fn new(input: &'a [u8]) -> IntervalLogIterator<'a> {
650        IntervalLogIterator {
651            orig_len: input.len(),
652            input,
653            ended: false,
654        }
655    }
656}
657
658impl<'a> Iterator for IntervalLogIterator<'a> {
659    type Item = Result<LogEntry<'a>, LogIteratorError>;
660
661    fn next(&mut self) -> Option<Self::Item> {
662        loop {
663            if self.ended {
664                return None;
665            }
666
667            if self.input.is_empty() {
668                self.ended = true;
669                return None;
670            }
671
672            // Look for magic comments first otherwise they will get matched by the simple comment
673            // parser
674            if let Ok((rest, e)) = log_entry(self.input) {
675                self.input = rest;
676                return Some(Ok(e));
677            }
678
679            // it wasn't a log entry; try parsing a comment
680            match ignored_line(self.input) {
681                Ok((rest, _)) => {
682                    self.input = rest;
683                    continue;
684                }
685                _ => {
686                    self.ended = true;
687                    return Some(Err(LogIteratorError::ParseError {
688                        offset: self.orig_len - self.input.len(),
689                    }));
690                }
691            }
692        }
693    }
694}
695
696fn duration_as_fp_seconds(d: time::Duration) -> f64 {
697    d.as_secs() as f64 + f64::from(d.subsec_nanos()) / 1_000_000_000_f64
698}
699
700fn system_time_as_fp_seconds(time: time::SystemTime) -> f64 {
701    match time.duration_since(time::UNIX_EPOCH) {
702        Ok(dur_after_epoch) => duration_as_fp_seconds(dur_after_epoch),
703        // Doesn't seem possible to be before the epoch, but using a negative number seems like
704        // a reasonable representation if it does occur
705        Err(t) => duration_as_fp_seconds(t.duration()) * -1_f64,
706    }
707}
708
709fn start_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
710    let (input, _) = tag("#[StartTime: ")(input)?;
711    let (input, duration) = fract_sec_duration(input)?;
712    let (input, _) = char(' ')(input)?;
713    let (input, _) = take_until("\n")(input)?;
714    let (input, _) = take(1_usize)(input)?;
715    Ok((input, LogEntry::StartTime(duration)))
716}
717
718fn base_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
719    let (input, _) = tag("#[BaseTime: ")(input)?;
720    let (input, duration) = fract_sec_duration(input)?;
721    let (input, _) = char(' ')(input)?;
722    let (input, _) = take_until("\n")(input)?;
723    let (input, _) = take(1_usize)(input)?;
724    Ok((input, LogEntry::BaseTime(duration)))
725}
726
727fn tag_bytes(input: &[u8]) -> IResult<&[u8], &[u8]> {
728    let (input, _) = tag("Tag=")(input)?;
729    let (input, tag) = take_until(",")(input)?;
730    let (input, _) = take(1_usize)(input)?;
731    Ok((input, tag))
732}
733
734fn tag_parser(input: &[u8]) -> IResult<&[u8], Tag> {
735    let (input, tag) = map_res(tag_bytes, str::from_utf8)(input)?;
736    Ok((input, Tag(tag)))
737}
738
739fn interval_hist(input: &[u8]) -> IResult<&[u8], LogEntry> {
740    let (input, tag) = opt(tag_parser)(input)?;
741    let (input, start_timestamp) = fract_sec_duration(input)?;
742    let (input, _) = char(',')(input)?;
743    let (input, duration) = fract_sec_duration(input)?;
744    let (input, _) = char(',')(input)?;
745    let (input, max) = double(input)?;
746    let (input, _) = char(',')(input)?;
747    let (input, encoded_histogram) = map_res(take_until("\n"), str::from_utf8)(input)?;
748    // Be nice to Windows users:
749    let encoded_histogram = encoded_histogram.trim_end_matches('\r');
750    let (input, _) = take(1_usize)(input)?;
751
752    Ok((
753        input,
754        LogEntry::Interval(IntervalLogHistogram {
755            tag,
756            start_timestamp,
757            duration,
758            max,
759            encoded_histogram,
760        }),
761    ))
762}
763
764fn log_entry(input: &[u8]) -> IResult<&[u8], LogEntry<'_>> {
765    complete(alt((start_time, base_time, interval_hist)))(input)
766}
767
768fn comment_line(input: &[u8]) -> IResult<&[u8], ()> {
769    let (input, _) = tag("#")(input)?;
770    let (input, _) = take_until("\n")(input)?;
771    let (input, _) = take(1_usize)(input)?;
772    Ok((input, ()))
773}
774
775fn legend(input: &[u8]) -> IResult<&[u8], ()> {
776    let (input, _) = tag("\"StartTimestamp\"")(input)?;
777    let (input, _) = take_until("\n")(input)?;
778    let (input, _) = take(1_usize)(input)?;
779    Ok((input, ()))
780}
781
782fn ignored_line(input: &[u8]) -> IResult<&[u8], ()> {
783    alt((comment_line, legend))(input)
784}
785
786fn fract_sec_duration(input: &[u8]) -> IResult<&[u8], time::Duration> {
787    let (rest, data) = fract_sec_tuple(input)?;
788    let (secs, nanos_str) = data;
789
790    // only read up to 9 digits since we can only support nanos, not smaller precision
791    let nanos_parse_res = match nanos_str.len().cmp(&9) {
792        Ordering::Greater => nanos_str[0..9].parse::<u32>(),
793        Ordering::Equal => nanos_str.parse::<u32>(),
794        Ordering::Less => nanos_str
795            .parse::<u32>()
796            // subtraction will not overflow because len is < 9
797            .map(|n| n * 10_u32.pow(9 - nanos_str.len() as u32)),
798    };
799
800    if let Ok(nanos) = nanos_parse_res {
801        return Ok((rest, time::Duration::new(secs, nanos)));
802    }
803
804    // nanos were invalid utf8. We don't expose these errors, so don't bother defining a
805    // custom error type.
806    Err(Err::Error(error_position!(input, ErrorKind::Alpha)))
807}
808
809type FResult<'a> = IResult<&'a [u8], (u64, &'a str)>;
810
811fn fract_sec_tuple(input: &[u8]) -> FResult {
812    let (input, secs) = map_res(
813        map_res(recognize(take_until(".")), str::from_utf8),
814        u64::from_str,
815    )(input)?;
816    let (input, _) = tag(".")(input)?;
817    let (input, nanos_str) = map_res(complete(take_while1(is_digit)), str::from_utf8)(input)?;
818    Ok((input, (secs, nanos_str)))
819}
820
821#[cfg(test)]
822mod tests;