Azure搜索文档添加自定义分析器,令牌生成器和令牌过滤器

问题描述

我正在将Azure Search sdk从Microsoft.Azure.Search(v10)迁移到Azure.Search.Documents(v11)。

在使用v10之前,我们能够使用C#SDK使用自定义分析器,令牌生成器...创建索引,如下所示:

var index = new Microsoft.Azure.Search.Models.Index(
                name: GetIndexName(),defaultScoringProfile: defaultScoringProfile,fields: AzureQuestionItemDeFinition.GetQuestionItemFieldsDeFinition(),analyzers: new[] {
                    new CustomAnalyzer
                    {
                        Name = "standardAnalyzer",Tokenizer = TokenizerName.Standard,TokenFilters = new[]
                        {
                            TokenFilterName.Lowercase,TokenFilterName.AsciiFolding,TokenFilterName.Phonetic,}
                    },new CustomAnalyzer
                    {
                        Name = "prefixAnalyzer","edgeNgramTokenFilter"
                        }
                    },},tokenFilters: new[]
                {
                    new EdgeNGramTokenFilterV2("edgeNgramTokenFilter",minGram: 2,maxGram: 10,EdgeNGramTokenFilterSide.Front),scoringProfiles: new[]
                {
                    new ScoringProfile(defaultScoringProfile)
                    {
                        TextWeights = new TextWeights()
                        {
                            Weights = new Dictionary<string,double>() {
                                { nameof(QuestionItem.Text),5.0 },{ nameof(QuestionItem.Context),{ $"{nameof(QuestionItem.Asker)}/{nameof(QuestionItem.Asker.Name)}",3.0 },{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.Text)}",2.0 },{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.AnswererName)}",2.0 }
                            }
                        }
                    }
                }

在迁移到新的Azure.Search.Documents v11时,我找不到使用C#SDK这样创建索引的方法

我发现 SearchIndex 属性只读

//
    // Summary:
    //     Represents a search index deFinition,which describes the fields and search behavior
    //     of an index.
    public class SearchIndex : IUtf8JsonSerializable
    {
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name is null.
        public SearchIndex(string name);
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        //   fields:
        //     Fields to add to the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name or fields is null.
        public SearchIndex(string name,IEnumerable<SearchField> fields);

        //
        // Summary:
        //     The name of the scoring profile to use if none is specified in the query. If
        //     this property is not set and no scoring profile is specified in the query,then
        //     default scoring (tf-idf) will be used.
        public string DefaultScoringProfile { get; set; }
        //
        // Summary:
        //     Options to control Cross-Origin Resource Sharing (CORS) for the index.
        public CorsOptions CorsOptions { get; set; }
        //
        // Summary:
        //     A description of an encryption key that you create in Azure Key Vault. This key
        //     is used to provide an additional level of encryption-at-rest for your data when
        //     you want full assurance that no one,not even Microsoft,can decrypt your data
        //     in Azure Cognitive Search. Once you have encrypted your data,it will always
        //     remain encrypted. Azure Cognitive Search will ignore attempts to set this property
        //     to null. You can change this property as needed if you want to rotate your encryption
        //     key; Your data will be unaffected. Encryption with customer-managed keys is not
        //     available for free search services,and is only available for paid services created
        //     on or after January 1,2019.
        public SearchResourceEncryptionKey EncryptionKey { get; set; }
        //
        // Summary:
        //     The type of similarity algorithm to be used when scoring and ranking the documents
        //     matching a search query. The similarity algorithm can only be defined at index
        //     creation time and cannot be modified on existing indexes. If null,the ClassicSimilarity
        //     algorithm is used.
        public SimilarityAlgorithm Similarity { get; set; }
        //
        // Summary:
        //     Gets the name of the index.
        [CodeGenMemberAttribute("name")]
        public string Name { get; }
        //
        // Summary:
        //     Gets the analyzers for the index.
        public IList<LexicalAnalyzer> Analyzers { get; }
        //
        // Summary:
        //     Gets the character filters for the index.
        public IList<CharFilter> CharFilters { get; }
        //
        // Summary:
        //     Gets or sets the fields in the index. Use Azure.Search.Documents.Indexes.FieldBuilder
        //     to define fields based on a model class,or Azure.Search.Documents.Indexes.Models.SimpleField,//     Azure.Search.Documents.Indexes.Models.SearchableField,and Azure.Search.Documents.Indexes.Models.ComplexField
        //     to manually define fields. Index fields have many constraints that are not validated
        //     with Azure.Search.Documents.Indexes.Models.SearchField until the index is created
        //     on the server.
        public IList<SearchField> Fields { get; set; }
        //
        // Summary:
        //     Gets the scoring profiles for the index.
        public IList<ScoringProfile> ScoringProfiles { get; }
        //
        // Summary:
        //     Gets the suggesters for the index.
        public IList<SearchSuggester> Suggesters { get; }
        //
        // Summary:
        //     Gets the token filters for the index.
        public IList<TokenFilter> TokenFilters { get; }
        //
        // Summary:
        //     Gets the tokenizers for the index.
        public IList<LexicalTokenizer> Tokenizers { get; }
        //
        // Summary:
        //     The Azure.ETag of the Azure.Search.Documents.Indexes.Models.SearchIndex.
        public ETag? ETag { get; set; }
    }

我的问题是如何设置自定义Tokenizer,TokenFilters,ScoringProfiles ...

解决方法

默认情况下,

集合属性在新的Azure .NET客户端库中初始化。尽管无法设置属性,但仍然可以在每个属性上调用Add

var index = new SearchIndex("myindex");
index.ScoringProfiles.Add(new ScoringProfile(...));

我个人觉得这不太方便,因为我喜欢编写基于表达式的代码,因此我已经将此反馈传递给了Azure SDK团队。