如果从文件中读取，Parsec 将失败且没有错误

问题描述

我编写了一个小型解析器来从用户提供的输入字符串或输入文件中读取样本。如果输入以分号分隔的字符串形式提供，它会在错误输入时正确失败并显示有用的错误消息：

> readUncalc14String "test1,7444,37;6800,36;testA,testB,2000,222;test3,7750,40"
*** Exception: Error in parsing dates from string: (line 1,column 29):
unexpected "t"
expecting digit

但是对于具有相同条目的输入文件 inputFile.txt，它无声地失败了：

test1,37
6800,36
testA,222
test3,40

> readUncalc14FromFile "inputFile.txt"
[Uncalc14 "test1" 7444 37,Uncalc14 "unkNownSampleName" 6800 36]

为什么会这样，我如何才能以有用的方式使 readUncalc14FromFile 失败？

这是我的代码的最小子集：

import qualified Text.Parsec                    as P
import qualified Text.Parsec.String             as P

data Uncalc14 = Uncalc14 String Int Int deriving Show

readUncalc14FromFile :: FilePath -> IO [Uncalc14]
readUncalc14FromFile uncalFile = do
    s <- readFile uncalFile
    case P.runParser uncalc14SepByNewline () "" s of
        Left err -> error $ "Error in parsing dates from file: " ++ show err
        Right x -> return x
    where
        uncalc14SepByNewline :: P.Parser [Uncalc14]
        uncalc14SepByNewline = P.endBy parSEOneUncalc14 (P.newline <* P.spaces)

readUncalc14String :: String -> Either String [Uncalc14]
readUncalc14String s = 
    case P.runParser uncalc14SepBySemicolon () "" s of
        Left err -> error $ "Error in parsing dates from string: " ++ show err
        Right x -> Right x
    where 
        uncalc14SepBySemicolon :: P.Parser [Uncalc14]
        uncalc14SepBySemicolon = P.sepBy parSEOneUncalc14 (P.char ';' <* P.spaces)

parSEOneUncalc14 :: P.Parser Uncalc14
parSEOneUncalc14 = do
    P.try long P.<|> short
    where
        long = do
            name <- P.many (P.noneOf ",")
            _ <- P.oneOf ","
            mean <- read <$> P.many1 P.digit
            _ <- P.oneOf ","
            std <- read <$> P.many1 P.digit
            return (Uncalc14 name mean std)
        short = do
            mean <- read <$> P.many1 P.digit
            _ <- P.oneOf ","
            std <- read <$> P.many1 P.digit
            return (Uncalc14 "unkNownSampleName" mean std)

解决方法

这里发生的事情是您输入的前缀是一个有效的字符串。要强制解析使用整个输入，您可以使用 eof 解析器：

uncalC14SepByNewline = P.endBy parseOneUncalC14 (P.newline <* P.spaces) <* P.eof

一个有效而另一个无效的原因是 sepBy 和 endBy 之间的差异。这是一个更简单的例子：

sepTest,endTest :: String -> Either P.ParseError String
sepTest s = P.runParser (P.sepBy (P.char 'a') (P.char 'b')) () "" s
endTest s = P.runParser (P.endBy (P.char 'a') (P.char 'b')) () "" s

这里有一些有趣的例子：

ghci> sepTest "abababb"
Left (line 1,column 7):
unexpected "b"
expecting "a"

ghci> endTest "abababb"
Right "aaa"

ghci> sepTest "ababaa"
Right "aaa"

ghci> endTest "ababaa"
Left (line 1,column 6):
unexpected "a"
expecting "b"

如您所见，sepBy 和 endBy 都可以静默失败，但如果前缀不以分隔符 sepBy 和 {{1 结尾，b 会静默失败如果前缀没有以主解析器 endBy 结尾，则}} 会静默失败。

因此，如果您想确保读取整个文件/字符串，您应该在两个解析器之后使用 a。

haskell haskell parsec parsing parsing