使用 TFIDF

问题描述

我需要通过描述创建一个推荐系统。 我的 json 课程数据集如下所示:

{"lang": "en","name": "Accounting Cycle: The Foundation of Business Measurement and Reporting","cat": "3/business_management|6/economics_finance","provider": "Canvas Network","id": 4,"desc": "This course introduces the basic financial statements used by most businesses,as well as the essential tools used to prepare them. This course will serve as a resource to help business students succeed in their upcoming university-level accounting classes,and as a refresher for upper division accounting students who are struggling to recall elementary concepts essential to more advanced accounting topics. Business owners will also benefit from this class by gaining essential skills necessary to organize and manage information pertinent to operating their business. At the conclusion of the class,students will understand the balance sheet,income statement,and cash flow statement. They will be able to differentiate between cash basis and accrual basis techniques,and kNow when each is appropriate. They\u2019ll also understand the accounting equation,how to journalize and post transactions,how to adjust and close accounts,and how to prepare key financial reports. All material for this class is written and delivered by the professor,and can be previewed here. Students must have access to a spreadsheet program to participate."}
{"lang": "en","name": "American Counter Terrorism Law","cat": "11/law","id": 5,"desc": "This online course will introduce you to American laws related to terrorism and the prevention of terrorism. My approach to the topic is the case-study method. Each week,we will read a case study,along with the statutes,regulations,and other law-related materials relevant to the case. We\u2019ll see how the case was handled in court and what reforms were enacted following the trial. Each week\u2019s assignment will include copies of the relevant laws and court rules,a glossary of terms,background readings,and other supplementary materials. The course will commence with the first attempt by Islamic militants to bring down the World Trade Center towers with a truck bomb in 1993. From there,I'll take you through the major terrorist incidents of the past 20 years,including acts perpetrated by homegrown terrorists,such as the Oklahoma City bombing of 1995 and the trial of the SHAC Seven (animal rights) terrorists in Trenton (NJ) in 2006. required materials: The textbook for this course is Counter Terrorism Issues: Case Studies in the Courtroom,by Jim Castagnera (estimated cost: $100) Find it at CRC Press"}
{"lang": "fr","name": "Arithm\u00e9tique: en route pour la cryptographie","cat": "5/computer_science|15/mathematics_statistics_and_data_analysis","id": 6,"desc": "This course is taught in french Vous voulez comprendre l'arithm\u00e9tique ? Vous souhaitez d\u00e9couvrir une application des math\u00e9matiques \u00e0 la vie quotidienne ? Ce cours est fait pour vous ! De niveau premi\u00e8re ann\u00e9e d'universit\u00e9,vous apprendrez les bases de l'arithm\u00e9tique (division euclidienne,th\u00e9or\u00e8me de B\u00e9zout,nombres premiers,congruence). Vous vous \u00eates d\u00e9j\u00e0 demand\u00e9 comment sont s\u00e9curis\u00e9es les transactions sur Internet ? Vous d\u00e9couvrirez les bases de la cryptographie,en commen\u00e7ant par les codes les plus simples pour aboutir au code RSA. Le code RSA est le code utilis\u00e9 pour crypter les communications sur internet. Il est bas\u00e9 sur de l'arithm\u00e9tique assez simple que l'on comprendra en d\u00e9tail. Vous pourrez en plus mettre en pratique vos connaissances par l'apprentissage de notions sur le langage de programmation Python. Vous travaillerez \u00e0 l'aide de cours \u00e9crits et de vid\u00e9os,d'exercices corrig\u00e9s en vid\u00e9os,des quiz,des travaux pratiques. Le cours est enti\u00e8rement gratuit !"}
{"lang": "en","name": "Becoming a Dynamic Educator","cat": "14/social_sciences","id": 7,"desc": "We live in a digitally connected world. The way information is generated,shared,processed and distributed is significantly impacting how we learn. If you have a passion for teaching and learning,and want to be an awesome educator Now and in the coming decades,this course is for you! We will begin the journey of becoming a dynamic educator for the digital age. This course will last 6-8 weeks and will probably take 3-4 hours of your time each week,if you want to earn the certificate of completion. But,no pressure,jump in and out as you like. It\u2019s all about the learning!"}
{"lang": "en","name": "Bioethics","cat": "2/biology_life_sciences","id": 8,"desc": "This self-paced course is designed to show that ethical theories can help provide frameworks for moral judgment and decision-making in the wake of recent scientific,technological,and social developments which have resulted in rapid changes in the biological sciences and in health care. This course also presents the academic foundations and historical development of multicultural moral decision-making and helps the student to develop their ability to interrelate reflectively,responsibly,and respectfully with a society of increasing intercultural connections. As grammar first describes how language is used,and then is in a position to prescribe how language ought to be used,is very similar to the approach taken in this course. This course first describes how people do in fact approach moral decision-making,and then is in a position to prescribe how multicultural and intercultural moral decision-making ought to made. Some of the topics to be covered are: Institutional Review Boards (IRB),Moral Development,Kant,Mill,Rawls,Informed Consent,Competency,information disclosure,Research on Human subjects,Principlism,and Food Systems.  required materials: Bioethics: Moral Philosophy,by Jeffrey W. Bulger,published by Plato\u2019s Press,2013. Cost: $49.96 Purchase at:  http://bioethics.me/ The text for this course (digital access) is required - you will not be able to complete the course without it. Please allow 24 hours for reading rights to become effective after submitting your Access Code for the textbook."}
{"lang": "en","name": "College Foundations: Reading,Writing,and Math","cat": "9/humanities|15/mathematics_statistics_and_data_analysis","id": 9,"desc": "This game-based course provides prospective students with a primer in college level reading,writing,and mathematics. Whether preparing to take a standardized placement test or simply improving readiness to handle college-level work,this course can help student build mastery and confidence. Students may choose to work at their own pace across all three subject areas,or to select individual content areas. Pretests will determine any learning deficits,which can then be mastered through self-paced learning modules. Not forgetting the importance of the human touch,this course is overseen by a trio of reading,and mathematics professors who will be available to assist and encourage students along their journey to college readiness."}
{"lang": "en","name": "Digital Literacies I","id": 10,"desc": "What\u2019s in your digital teaching toolBox? Do you have the tools you need to reach 21st century learners? This course will introduce you to digital technologies and show you how to integrate them into your classroom/webspace."}
{"lang": "en","name": "Digital Literacies II","id": 11,"desc": "The goal of the Digital Literacy 2 course is to provide practicing and pre-service educators as well as others with the tools and kNowledge necessary to enhance and enrich the educational experiences of students\u2019 through digital technologies."}
{"lang": "en","name": "Digital Tools for the K-12 Educator","id": 12,"desc": "Ready to explore Web-based tools to ignite student engagement in your K-12 classroom? This course examines varIoUs Web tools,reasons for using these tools in the classroom,and encourages you to experiment with the tools. Each week we will explore different instructional methods and utilize emerging technology to develop presentations,posters,organization tools,stories,and scavenger hunts. We will investigate uses and good practices for both teacher-led and student-driven activities through the use of free Web-based tools like Prezi,Wordle,Padlet,Voki,and more!"}
{"lang": "en","name": "discover Your Value: Turning Experience into College Credit","id": 13,"desc": "This self-paced course provides participants with the opportunity to explore,assess,and document learning mastered through a variety of life experiences."}
{"lang": "en","name": "Enhancing Patient Safety through Interprofessional Collaborative Practice","cat": "12/medicine_health","id": 14,"desc": "What is \u201cinterprofessional collaborative practice\u201d and why does it matter to you? The term \u201cinterprofessional collaborative practice\u201d is prevalent in the healthcare environment today. Whether you are a nurse,pharmacist,doctor,healthcare student,or just interested in how to better care for loved ones,this experience will offer opportunities for you to develop the type of collaborative skills necessary to improve patient safety. We will also challenge you to evaluate and improve the level of collaboration in your work setting in the first ever MOOC2Degree Course. This course focuses on helping nurses and other healthcare professionals improve patient safety by developing the competencies associated with interprofessional collaborative practice. Topics to be covered include: \u2022 What is interprofessional collaborative practice? \u2022 How can interprofessional collaborative practice improve patient safety? \u2022 What competencies associated with interprofessional collaborative practice should all healthcare professionals have? \u2022 How can you develop the core competencies of interprofessional collaborative practice? \u2022 How can you create an environment for effective interprofessional collaborative practice in your work setting? RN to BSN MOOC2Degree Credit by Exam Opportunity: What makes this course truly unique is that for the first time as a MOOC2Degree participant,you Could receive course credit by exam toward your RN to BSN degree at The University of Texas at Arlington College of Nursing. To receive credit for this MOOC course,you need to: 1. Successfully complete the MOOC2Degree course with a score of 80% or higher on all 6 self-assessments 2. Complete the online proctored exam within 7 days after the course ends with a score of 70% or higher and pay a nominal fee ($17.50-$26.50) for the online exam 3. Apply and be accepted to the UT Arlington College of Nursing for the RN to BSN Program,see the admissions criteria here. 4. After acceptance to the RN to BSN program,you will then request and be awarded credit for the MOOC2Degree course by UT Arlington College of Nursing,which requires a $25 processing fee. Complete this questionnaire to find out if you qualify to earn credit for the MOOC2Degree course. Review the terms and conditions for full details on receiving course credit for the MOOC2Degree course. Materials: All learning materials are embedded in the course or available online at no cost. The following resource will be used extensively: Core Competencies for Interprofessional Collaborative Practice: Report of an expert panel To provide multiple perspectives on interprofessional collaborative practice,your professor,Dr. Beth Mancini,has incorporated guest speakers from varIoUs healthcare roles,settings,and backgrounds."}
{"lang": "en","name": "Ethics and Values in a Multicultural World","cat": "16/languages","id": 15,"desc": "This course presents the academic foundations and historical development of multicultural moral decision-making and helps students develop their ability to interrelate reflectively,and respectfully with a society of increasing intercultural connections. Students will first explore how people approach moral decision-making,and then how multicultural and intercultural moral decision-making ought to be made. This approach is analogous to how grammar first describes the way language is in fact used,and how it then prescribes the way language ought to be used. A blend of online instructional strategies will be utilized throughout this course. Students can expect to spend three to six hours per week to complete and submit all course deliverables. Preparation for exams will require additional time. Upon successful completion of this course,students should have the ability to engage in serIoUs reflection on issues of ethics and values related to intercultural and multicultural decision-making. required Text: $49.99 Jeffrey W. Bulger,MORAL PHILOSOPHY: A Theoretical and Practical Approach to Moral Decision-Making,Vol 1-8,Plato\u2019s Press,2013. Purchase the book at: http://platospress.net/ Please allow 24 hours for reading rights to become effective after submitting your Access Code. Note: Purchasing the text is one of the prerequisites,along with getting a 100% on the syllabus quiz and submitting the Orientation Course Assessment before WEEK ONE will unlock."}
{"lang": "en","name": "Exploring Chemistry","cat": "4/chemistry","id": 16,"desc": "Chemistry is an integral part of our lives and the world we live in. Chemistry explains the world around us. Are you a college student intimidated by a chemistry course? Do you need a head start in exploring chemistry in order to be prepared for general chemistry courses? In this pre-college course,students will be introduced to the fundamentals of chemistry. Concepts,terminologies,and basic mathematics skills required for conversions in chemistry will be covered. This basic chemistry course is recommended for McHenry County College\u2019s students prior to enrolling in CHM164: Introductory Chemistry"}
{"lang": "en","name": "Exploring Engineering","cat": "8/engineering_technology","id": 17,"desc": "Are you considering a career in engineering? Are you fascinated by what engineers do? In this pre-college course,you will gain an understanding of the varIoUs fields of engineering and explore the engineering design process,from conceptual design and optimal choice evaluation to project construction."}
{"lang": "en","name": "Fairy Tales: Origins and Evolution of Princess Stories","cat": "1/arts_music_film","id": 18,"desc": "Princess stories have been popular for centuries and remain so today around the world; we\u2019ll dive into what these fairy tales mean,and trace the history of these narratives back to their source material,examining contexts all along the way. We\u2019ll borrow tools from cultural studies,literature studies,and film studies to help us analyze these phenomena and what they mean to our society. Many of us may associate princess stories with modern-day products (much of it marketed to small children) or with disney movies and theme parks. We\u2019ll examine these current versions of fairy tale mythos as well,using our new interpretive tools to uncover not just what\u2019s been changed in the moral and message of the narrative,but what the stories mean as told Now."}
.............

而且我有一些来源的 ID(例如 93,106,108)。我应该制作 TFIDF 并在数据集中比较数千门课程,并通过 desc 列找到其他最相似的课程。

我已经制作了 tfidf,但我不知道如何将它们与数据集中的其他人进行比较。

import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType};
import org.apache.spark.ml.feature.{HashingTF,IDF,Tokenizer}

val df: DataFrame = spark.read.json("C:/Users/.../Desktop/BigData/DO_record_per_line.json").toDF

val tokenizer = new Tokenizer().setInputCol("desc").setoutputCol("words")
val wordsData = tokenizer.transform(df)

val hashingTF = new HashingTF()
      .setInputCol("words").setoutputCol("rawFeatures").setNumFeatures(10000)

val featurizedData = hashingTF.transform(wordsData)

val idf = new IDF().setInputCol("rawFeatures").setoutputCol("features")
    val idfModel = idf.fit(featurizedData)

val rescaledData = idfModel.transform(featurizedData)
    rescaledData.head

如何根据给定者的描述找到最相似的课程?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)