Map
The first combinator we are going describe is the Map combinator.
Let's say we have a parser to parse a digit, i.e.
any(|c: char| c.is_ascii_digit()). But we are not interested in the
character, but in the integer.
char has a method that can return the numeric value of a digit
character. It has the following signature.
pub fn to_digit(self, radix: u32) -> Option<u32>
Notice that the return type is an Option, because if all you have is
a character, you are not sure how to turn it into a number. E.g. What
is the value of Y?
So the code we would like to run is something along the lines of
#[test]
fn parse_any_digit_as_number() {
let input = "1230";
let parser = map(
any(|c: char| c.is_ascii_digit()),
|c: char| c.to_digit(10).unwrap_or(0));
let actual = parser.parse(input);
let expected = Ok((1, "230"));
assert_eq!(actual, expected);
}
During the mapping we are unwrapping the option with a default value of zero.
Map
With our goal clear, we still are following our strategy. So, let's first create a struct.
pub struct Map<'a, I, O, P, F> where I: 'a, P: Parser<'a, I> + Sized, F: Fn(I) -> O + Sized {
parser: P,
map: F,
}
Woah, that signature has a lot of stuff in it. Let's break it done step by step.
Lifetimes & Generics
First of all there is a lifetime ``aparameter. This lifetime is needed byParser` and keeps track of the lifetime of the input.
Next are generic parameters I and O. They are used by the other generic
parameters P and F. Almost all of these are aliases.
I is restricted to I: 'a. It should live as long as our input. Together
with the chosen letter I, it probably stands for some form of input.
P is restricted to P: Parser<'a, I> + Sized. So P is a parser that
produces an I. For example our any parser is a produces an char.
F is restricted to F: Fn(I) -> O + Sized. Now it becomes clear would
I and O are. They are the input and output for the function F. So
F is a function that is used to map the result of parser P, which is
of type I, into type O.
Pfew, that was a lot to take in. But with an overview all the parts make sense.
PhantomData
Unfortunatly, this will not compile. The compiler warns us of several unused parameters. E.g.
82 | pub struct Map<'a, I, O, P, F> where I: 'a, P: Parser<'a, I> + Sized, F: Fn(I) -> O + Sized {
| ^^ unused parameter
|
= help: consider removing `'a` or using a marker such as `std::marker::PhantomData`
The advice of removing the lifetime parameter is not an option for us.
We need it for our parser P. Maybe the other advice is viable.
The doccumentation of PhantomData reads
Zero-sized type used to mark things that "act like" they own a T.
Adding a PhantomData
field to your type tells the compiler that your type acts as though it stores a value of type T, even though it doesn't really. This information is used when computing certain safety properties.
That is precisely our case. What a good compiler our compiler is. So we will follow our compilers advice and add a phantom field to our struct.
phantom: PhantomData<&'a I>,
Which satisfies the compiler.
Impl Parser
With our struct out of the way, we should focus on the implementation. What we want to achieve is
- Try the parser.
- Map the result with our function, keeping the rest of our input.
Besides the verbose type signatures this translates very well into Rust.
impl<'a, I, O, P, F> Parser<'a, O> for Map<'a, I, O, P, F> where I: 'a, P: Parser<'a, I> + Sized, F: Fn(I) -> O + Sized {
fn parse(&self, input: &'a str) -> Result<(O, &'a str), ParseError> {
let attempt = self.parser.parse(input);
attempt.map(|(v, rest)|{ ((self.map)(v), rest)})
}
}
Constructor & Factory
Creating the constructor and factory is almost straight forward, albeit
verbose. Accept the parameters you need and create the Map. The only
snag is that PhantomData.
Luckily the way to assign a PhantomData is by doing just that.
Summary
We have just created our first combinator! It takes a parser and transforms its result into something different.
Exercises
- Implement the
Mapparser. - Write some tests to check your implementation.