问题描述
我想逐行读取.docx文件,并在MysqL数据库表中插入该章的章节。
我尝试使用Microsoft.Office.Interop.Word来阅读文档。
我的file.docx被分为章节和章节
file.docx的结构
Chapter 1 - Events
-alert or disservices
-significant activities
Chapter 2 – Safety
-near miss
-security checks
Chapter 3 – Training
-environment
-upkeep
可以在debug visual studio上正确读取整个文档
但是相反,在MysqL数据库表上,我具有这些行,即整个文档插入到每一行中,而无需区分章节和段落
该如何解决?
在此先感谢您的帮助或建议
下面的“我的代码和表结构”一章
protected void Page_Load(object sender,EventArgs e)
{
if (!IsPostBack)
{
Application word = new Application();
object miss = Missing.Value;
object path = @"C:\Users\file.docx";
object readOnly = true;
Document docs = word.Documents.Open(ref path,ref miss,ref readOnly,ref miss);
string totaltext = ""; //the whole document
var ran = docs.Content;
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
var chap = ran.Text;
var subhead = ran.Text;
var contents = ran.Text;
string constr = ConfigurationManager.ConnectionStrings["cn"].ConnectionString;
string strsql = @"INSERT INTO Chapters (chapter,subheading,contents) VALUES (?,?,?);";
using (MysqLConnection conn =
new MysqLConnection(constr))
{
conn.open();
using (MysqLCommand cmd =
new MysqLCommand(strsql,conn))
{
cmd.Parameters.AddWithValue("param1",chap);
cmd.Parameters.AddWithValue("param2",subhead);
cmd.Parameters.AddWithValue("param3",contents);
cmd.ExecuteNonQuery();
}
conn.Close();
}
totaltext += docs.Paragraphs[i + 1].Range.Text.ToString() + "<br />";
}
Response.Write(totaltext);
docs.Close();
word.Quit();
}
}
DROP TABLE IF EXISTS `chapters`;
CREATE TABLE `chapters` (
`chapter` longtext CHaraCTER SET utf8 COLLATE utf8_general_ci,`subheading` longtext CHaraCTER SET utf8 COLLATE utf8_general_ci,`contents` longtext CHaraCTER SET utf8 COLLATE utf8_general_ci,`sID` int(11) NOT NULL AUTO_INCREMENT,PRIMARY KEY (`sID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
解决方法
此代码是错误的:
var ran = docs.Content;
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
var chap = ran.Text;
var subhead = ran.Text;
var contents = ran.Text;
您要将ran
设置为文档的全部内容,然后将其复制到chap
,subhead
和contents
变量(用于插入每行进入数据库)。
每次在数据库中插入一行时,都需要将chap
,subhead
和contents
设置为不同的值(基于当前的段落?)。