如何检查段落内容以在C#中逐行读取.docx文件

问题描述

上传后,我想逐行阅读 .docx文件

我的 file.docx 被划分为

file.docx

的结构
Chapter 1 - Events
alert or disservices
significant activities

Chapter 2 – Safety
near miss
security checks

Chapter 3 – Training
environment
upkeep

我尝试使用Microsoft.Office.Interop.Word来阅读文档。

整个文档

enter image description here

现在根据章节,我必须在相应的数据库表中插入章节和该段落的内容

例如

Chapter 1 - Events
 - alert or disservices
Lorem ipsum dolor sit amet,consectetur adipiscing elit ….
…. ….
…. ….
- significant activities
Phasellus dui nunc,rutrum vitae dictum eleifend,ullamcorper hendrerit sem ….
…. ….
…. ….

必须插入表Events

-- ----------------------------
-- Table structure for events
-- ----------------------------
DROP TABLE IF EXISTS `events`;
CREATE TABLE `events` (
  `sID` int(11) NOT NULL AUTO_INCREMENT,`alert_or_disservices` longtext,`significant_activities` longtext,PRIMARY KEY (`sID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

请你能帮我吗?

在此先感谢您的帮助或建议

我的下面的代码

protected void Page_Load(object sender,EventArgs e)
{
    if (!IsPostBack)
    {
        Application word = new Application();
        object miss = Missing.Value;
        object path = @"C:\\file.docx";
        object readOnly = true;
        Document docs = word.Documents.Open(ref path,ref miss,ref readOnly,ref miss);

        string totaltext = "";      //the whole document

        for (int i = 0; i < docs.Paragraphs.Count; i++)
        {   
            totaltext += docs.Paragraphs[i + 1].Range.Text.ToString() + "<br />";
        }

        Response.Write(totaltext);
        docs.Close();
        word.Quit();
    }
}

更新#1

  1. 标题可识别的章节
  2. 警报或服务中断之前仅带有文本连字符
  3. 每个新段落均以文本连字符开头
  4. 警报块中不存在硬性返回/段落标记
  5. 我为每一章创建了一个表格,各列的标题与各段的标题相同,但是如果有更好的解决方案,欢迎您

我想共享.docx文件供您下载,但我不知道如何。

我尝试使用wetransfer,但由于它是不受信任的来源而未被批准

更新#2

protected void Page_Load(object sender,EventArgs e)
{
    if (!IsPostBack)
    {
        var wdApp = new Microsoft.Office.Interop.Word.Application();
        var doc = wdApp.Documents.Open(@"C:\\file.docx");

        var ran = doc.Content;
        var fin = ran.Find;
        fin.ClearFormatting();
        fin.MatchWildcards = false;
        fin.Text = "";
        fin.set_Style("Chapter 1 - Events"); //use your heading style here,e.g. Heading 1
        fin.Execute();
        while (fin.Found)
        {
            var chap = ran.Text;

            //cut off "Chapter[space]" from start,clean text from trailing carriage returns and stuff
            chap = chap.Substring(8).TrimEnd('\r','\n','\t',' ');

            //Heading ended by hard return/para mark; get text of following paragraph '-alert or disservice'
            ran = doc.Range(ran.End,ran.End).Paragraphs[1].Range;
            var subhead = ran.Text;

            //clean subheading of leading hyphen and space,trailing stuff
            subhead = subhead.TrimStart(' ','-').TrimEnd('\r',' ');

            //get text under subheading = contents,clean up
            ran = doc.Range(ran.End,ran.End).Paragraphs[1].Range;
            var contents = ran.Text;
            contents = contents.TrimEnd('\r',' ');

            //write to db
            string constr = ConfigurationManager.ConnectionStrings["cn"].ConnectionString;

            string strSql = @"INSERT INTO Chapters (chapter,subheading,contents) VALUES (?,?,?)";

            using (MySqlConnection con = new MySqlConnection(constr))
            {
                using (MySqlCommand cmd = new MySqlCommand(strSql))
                {
                    con.Open();
                    cmd.Parameters.AddWithValue("param1",chap);
                    cmd.Parameters.AddWithValue("param2",subhead);
                    cmd.Parameters.AddWithValue("param3",contents);
                    cmd.ExecuteNonQuery();
                    con.Close();
                }
            }

            ran = doc.Range(ran.End,doc.Content.End);
            fin = ran.Find;
            fin.ClearFormatting();
            fin.MatchWildcards = false;
            fin.Text = "";
            fin.set_Style("Chapter 1 - Events"); //use your heading style here,e.g. Heading 1
            fin.Execute();
        }
        doc.Close(false);
        wdApp.Quit();
    }
}

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)