How to Enable Partition Projection in AWS Athena
 
 Athena Partition Projection, introduced in June 2020, eliminates the need to run MSCK REPAIR TABLE to add new partitions manually.
Building Backend
The key point in the CloudFormation template is the GlueTable.TableInput.Parameters section of the GlueTable definition (lines 27–35). This section enables Partition Projection with the following important properties:
- projection.enabled: Enables partition projection.
- projection.year_month.format: Specifies the date format for partitions.
- projection.year_month.range: Defines the range for partitions, supporting- NOWwith offsets (e.g.,- NOW+9HOURS).
The projection.year_month.range property is treated as UTC. If you intend to use a different timezone, you need to specify the time difference using a format like NOW+9HOURS.
For more details, please refer to the official documentation.
AWSTemplateFormatVersion: "2010-09-09"Description: Stack for Athena partition projection sampleResources:  S3:    Type: AWS::S3::Bucket    Properties:      BucketName: athena-partition-projection-logs      BucketEncryption:        ServerSideEncryptionConfiguration:          - ServerSideEncryptionByDefault:              SSEAlgorithm: AES256
  GlueDatabase:    Type: AWS::Glue::Database    Properties:      DatabaseInput:        Name: sample      CatalogId: !Ref AWS::AccountId
  GlueTable:    Type: AWS::Glue::Table    Properties:      DatabaseName: !Ref GlueDatabase      CatalogId: !Ref AWS::AccountId      TableInput:        TableType: EXTERNAL_TABLE        Parameters:          classification: json          "projection.enabled": true          "projection.year_month.format": yyyy/MM          "projection.year_month.interval": 1          "projection.year_month.interval.unit": MONTHS          "projection.year_month.range": 2021/09,NOW          "projection.year_month.type": date          "storage.location.template": s3://athena-partition-projection-logs/${year_month}        StorageDescriptor:          Columns:            - Name: id              Type: int            - Name: message              Type: string          Location: !Sub s3://${S3}/          InputFormat: org.apache.hadoop.mapred.TextInputFormat          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat          Compressed: false          NumberOfBuckets: 0          SerdeInfo:            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe            Parameters:              paths: id,message          StoredAsSubDirectories: false        PartitionKeys:          - Name: year_month            Type: string        Retention: 0        Name: sample_logsDeploy the CloudFormation stack with the following command:
aws cloudformation deploy \  --template-file stack.yml \  --stack-name athena-partition-projection-sampleUploading Data
Upload data to the S3 bucket for testing:
echo '{"id": 1, "message": "hello"}' > 2021-09.jsonecho '{"id": 2, "message": "world"}' > 2021-10.jsonaws s3 cp 2021-09.json s3://athena-partition-projection-logs/2021/09/aws s3 cp 2021-10.json s3://athena-partition-projection-logs/2021/10/Verify that the files are uploaded:
aws s3 ls s3://athena-partition-projection-logs/2021/Querying Data
Run the following SQL query to fetch data from the 2021/09 partition:
SELECT * FROM "sample"."sample_logs"WHERE year_month = '2021/09'LIMIT 10;Expected result:
1  hello  2021/09Similarly, query the 2021/10 partition:
SELECT * FROM "sample"."sample_logs"WHERE year_month = '2021/10'LIMIT 10;Expected result:
2  world  2021/10Cleaning Up
Clean up all the AWS resources provisioned during this example with the following command:
aws s3 rm --recursive s3://athena-partition-projection-logsaws cloudformation delete-stack --stack-name athena-partition-projection-sample