dokee's site

Back

Modern C++ Basics - String and StreamBlur image

String and string view#

std::string#

APIs#

  • .reserve() in std::string can be used to shrink memory before; but since C++20 it’s same as std::vector.
  • .assign()/.insert()/.erase()/.append() also provides index-based version.
    • All index-based methods returns std::string& instead of iterator.
  • +/+=/hash
  • .starts_with()/.ends_with()/.contains()
  • .substr(): return a new string
  • .replace()/.replace_with_range()
  • .data() -> char*/.c_str() -> const char*
  • Search: .find()/.rfind()/.find_first(_not)_of()/.find_last(_not)_of()
    • Return index instead of iterator.
    • Return std::string::npos (i.e. static_cast<size_t>(-1)) if not found.

Notes#

  • It is just an enhancement to std::vector.
  • std::string guarantees the underlying string is null-terminated (i.e. '\0').
    • You can also have '\0' in your string too, since it doesn’t judge end like C-style string, but by .size().
  • It has SSO (small string optimization).
  • Convert a string to/back from a number:
    • std::stoi/sto(u)l/sto(u)ll(string, std::size_t* end = nullptr, int base = 10)
    • std::stof/stod/stold(string, std::size_t* end = nullptr)
    • std::to_string(): same as std::format(“{}”, val) since C++26
  • .resize_and_overwrite(newSize, Op)

std::string_view#

  • It’s like a specialization of std::span<const char>, i.e. it just has a const char* with a length.
class Hasher {
public:
    using is_transparent = void;
    auto operator()(std::string_view sv) const {
        return std::hash<std::string_view>()(sv);
    }
};
cpp
std::vector<std::string> vec { "PKU", "THU", "CMU" };
std::ranges::sort(vec, [](const std::string& s1, const std::string& s2) {
    return std::string_view{ s1 }.substr(1) < std::string_view{ s2 }.substr(1);
});
cpp

Caveats#

  • std::string_view is not required to be null-terminated.
    • It may be not safe to use .data() to pass into C-string APIs.
  • The pointer it contains can be nullptr (as default ctor does).
  • You should be really cautious if you want to use std::string_view as return value.
  • If you will create the string anyway (like in a ctor), pass a std::string_view is not a good idea.

Misc#

  • Character: 'a', '\n', \0, \123, '\x12', '\o{12}', '\x{12}'

  • Raw strings: R"(\\\n\")" R"+(I want a )"!)+"

  • using namespace std::literals

    • "xxx"s -> std::string
    • "xxx"sv -> std::string_view
    • Time-related: 1s, 1ms, 1d
    • Complex-related: 1i, i.5if, 2.5id
  • User-defined literals: operator"" _xx

constexpr unsigned int operator"" _KB(unsigned long long m) {
    return static_cast<unsigned int>(m) * 1024;
}
cpp
  • std::stoi()/std::to_string() will create new std::string (costly); we may want to provide storage ourselves.
  • You can use std::from_chars and std::to_chars in <charconv>.

Format#

  • std::format()
  • {order : fill – align – sign - # - 0 – width - .precision – L – type}
    • align: </^/>
    • sign: +/-
    • type:
      • Integer: b/B/d/o/x/X
        • For bool, s as default.
        • For char/wchar_t, c as default and ?.
      • Floating point: e/f/g/a/E/F/G/A
      • String: s/?
      • Pointer: p/P
    • #:
      • Integer: b/B/o/x/X -> 0b/0B/0/0x/0X
      • Floating point: dot will always be shown.
        • For explicit #g/#G, all zeros will be shown.
    • .precision:
      • Floating point: precision.
      • String: the maximum characters to output.
    • 0: Fill into 0 for only integers and floating points after sign and prefix.
  • width and precision can be determined in runtime:
    • std::format("{:{}.{}e}", 3.14f, 3, 10);
  • std::format_to(OutIt, ...)/std::format_to_n(OutIt, n, ...)
  • std::runtime_format()
  • For ranges: n/m/nm
    • Also support fill, align, width specifiers.
    • e.g. {:*^50n::#x} for std::vector<std::array<int, 2>>
    • For string elements, {::} is different from {}.

User-defined format#

enum class Color {
    Red = 0xff0000,
    Green = 0x00ff00,
    Blue = 0x0000ff,
    White = 0xffffff,
};

template<>
struct std::formatter<Color> {
    char type = 's';
    constexpr auto parse(const std::format_parse_context& context) {
        auto it = context.begin();
        if (it == context.end() or *it == '}') {
            return it;
        }
        type = *(it++);

        if (type != 'x' and type != 's') {
            throw std::format_error{ "unrecognized color format." };
        }
        return it;
    }

    auto format(Color color, auto& context) const {
        auto format_by_type = [it = context.out(), type = type](std::string_view string_info, std::string_view number_info) {
            return type == 's' ? std::format_to(it, "{}", string_info) : std::format_to(it, "{}", number_info);
        };
        switch (color) {
            using enum Color;
            case Red:
                return format_by_type("Red", "#ff0000");
            case Green:
                return format_by_type("Green", "#00ff00");
            case Blue:
                return format_by_type("Blue", "#0000ff");
            case White:
                return format_by_type("White", "#ffffff");
            default:
                auto it = context.out();
                if (type == 's') {
                    it = std::format_to(it, "Unknown color: ");
                }
                return std::format_to(it, "#{:0>6x}", std::to_underlying(color));
        }
    }
};
cpp
  • Notice that we have a White = 0xffffff so that it’s legal.
    • Range of scoped enumeration is the (1 << (MSB(MaxEnum) + 1))- 1; otherwise UB.
  • Use auto& and const auto& (support std::wstring)
  • std::range_formatter<T>: to be inherited by std::formatter<Container<T>>
    • .set_brackets(left, right)
    • .set_separators(sep)
    • .underlying()

Stream#

Stream overview#

  • System call is relatively expensive, so we prepare a buffer, and then just adjust pointer to buffer when reading.

Output stream#

  • The stream buffer can be got by .rdbuf(), which returns std::basic_streambuf*.

    • It’s a base class that has a protected ctor, which uses polymorphism to access the actual buffer.
    • protected virtual methods can be overridden in derived class.
  • There are three bits for stream status:

    • std::ios_base::eofbit is commonly used to denote end of stream.
    • std::ios_base::failbit: commonly used when parsing error, e.g. if you std::cin >> some_float but input a character ‘a’; or failing to open/close file.
    • std::ios_base::badbit: some irrecoverable error happens.
    • operators:
      • get: .rdstate()
      • test: s & std::ios_base::eofbit/.good() -> bool/...
      • set: .clear(states = goodbit)
      • add: .setstate(states) or .clear(rdstate() | states)
      • throw: .exceptions(states)

Input stream#

Bidirectional stream and stream linking#

  • Bidirectional stream: Whether they share the same position is determined by the derived class.
  • Stream linking: When the istream reads something, ostream will call flush() automatically.
    • By .tie(basic_ostream*).

Standard streams#

File stream#

  • Open mode: in/out/trunc/noreplace/ate/app/binary
    • If you use in | out (default), truncation will not happen automatically.
    • Use noreplace if you want to fail rather than create a new file.
    • ate will only seek to end once, while app will always append at the end (even .seekg() doesn’t affect it).

String stream#

  • You can use .str() to copy it out, or replace with a new string by .str(newStr).
  • Use .view() to get the std::string_view to it.
    • But notice that future reallocation may make the view invalid!

Span stream#

  • Changes are applied on the buffer directly, without worrying about whether the output exceeds the buffer (just truncates it rather than reallocate).
    • badbit will be set in this case.
    • You need to ensure that the lifetime of buffer ≥ scope for span stream to operate on it.
  • Get view: .span(), get span<CharT>
    • Notice: if the mode contains out, then it returns [pbase, pptr), i.e. the part that’s already written. Otherwise the whole buffer.

Synchronized stream#

  • Each thread has its own std::osyncstream.
  • when you use std::flush_emit explicitly, the buffer is output to the attached stream without data race.
  • If you want / don’t want to emit every std::flush, you can use << std::emit_on_flush / std::noemit_on_flush.
Modern C++ Basics - String and Stream
https://astro-pure.js.org/blog/c/modern-c-basics/string-and-stream
Author dokee
Published at March 10, 2025
Comment seems to stuck. Try to refresh?✨